Deploying Pivotal Greenplum for Kubernetes
Overview
The Greenplum for Kubernetes distribution provides a Docker image with the open source Greenplum Database software on Ubuntu, for deployment to Kubernetes clusters. Greenplum for Kubernetes also provides management scripts to help you configure, deploy, and expand a Greenplum cluster in Kubernetes.
This topic describes how to deploy a new Greenplum cluster to Kubernetes, and how to expand an existing Greenplum deployment in Kubernetes. You can use one of the following environments to run Greenplum for Kubernetes:
Minikube. A Minikube installation enables you to demonstrate Greenplum for Kubernetes on your local machine, using minimal resources.
Pivotal Container Services (PKS) running on Google Cloud Platform (GCP). PKS on Google Cloud Platform is the recommended environment for evaluating, testing, or using Greenplum for Kubernetes.
Google Kubernetes Engine (GKE) running on Google Cloud Platform (GCP). GKE on Google Cloud Platform can also be used for evaluating and testing Greenplum for Kubernetes.
Deploying Greenplum to Minikube
Follow this procedure to deploy Greenplum for Kubernetes to Minikube, to demonstrate Greenplum on your local system.
Prerequisites
To deploy Greenplum for Kubernetes on Minikube, you require:
kubectl
command-line utility. Install the version ofkubectl
that is distributed with PKS, even if you are deploying Greenplum to Minikube. See Installing the Kuberenetes CLI in the PKS documentation for instructions.Docker. Install a recent version of Docker to your machine, and start Docker.
Minikube. See the Minikube documentation to install the latest version of Minikube. As part of the Minikube installation, install a compatible hypervisor to your system if one is not already available.
Note: Do not install or re-installkubectl
as part of the Minikube installation.Helm package manager utility. Follow the instructions at Kubernetes Helm to install the latest version of
helm
.
To validate that your system meets these prerequisites, ensure that the following commands execute without any errors, and that the output versions are similar to the ones shown:
$ kubectl version --client
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.4", GitCommit:"5ca598b4ba5abb89bb773071ce452e33fb66339d", GitTreeState:"clean", BuildDate:"2018-06-06T08:13:03Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"darwin/amd64"}
$ docker --version
Docker version 18.03.1-ce, build 9ee9f40
$ minikube version
minikube version: v0.28.1
$ helm version --client
Client: &version.Version{SemVer:"v2.9.1", GitCommit:"20adb27c7c5868466912eebdf6664e7390ebe710", GitTreeState:"clean"}
Note: The documented procedure builds the required Docker image locally and uploads it to Minikube. As an alternative you can use Docker support for Kubernetes.
Procedure
Start Docker if it is not already running on your system.
Start the Minikube cluster using resources to ensure reasonable response times for the Greenplum service. For example:
$ minikube start --memory 4096 --cpus 4
Starting local Kubernetes v1.10.0 cluster... Starting VM... Downloading Minikube ISO 160.27 MB / 160.27 MB [============================================] 100.00% 0s Getting VM IP address... Moving files into cluster... Downloading kubeadm v1.10.0 Downloading kubelet v1.10.0 Finished Downloading kubeadm v1.10.0 Finished Downloading kubelet v1.10.0 Setting up certs... Connecting to cluster... Setting up kubeconfig... Starting cluster components... Kubectl is now configured to use the cluster. Loading cached images from config file.
Confirm that the
kubectl
utility can access Minikube:$ kubectl get node
NAME STATUS ROLES AGE VERSION minikube Ready master 2m v1.10.0
Note: If you have problems starting or connecting to Minikube, use
minikube delete
to remove the current Minikube and then recreate it.Download the Greenplum for Kubernetes software from Pivotal Network. The download file has the name:
greenplum-for-kubernetes-<version>.tar.gz
.Go to the directory where you downloaded Greenplum for Kubernetes, and unpack the downloaded software. For example:
$ cd ~/Downloads $ tar xzf greenplum-for-kubernetes-*.tar.gz
The above command unpacks the distribution into a new directory named
greenplum-for-kubernetes-<version>
.Go into the new
greenplum-for-kubernetes-<version>
directory:$ cd ./greenplum-for-kubernetes-*
Change the local docker environment to point to the Minikube docker, so that the local docker daemon interacts with images inside the Minikube docker environment:
$ eval $(minikube docker-env)
Note: To undo this docker setting in the current shell, run
eval "$(docker-machine env -u)"
.Load the Greenplum for Kubernetes Docker image to the local Docker registry:
$ docker load -i ./images/greenplum-for-kubernetes
644879075e24: Loading layer [==================================================>] 117.9MB/117.9MB d7ff1dc646ba: Loading layer [==================================================>] 15.87kB/15.87kB 686245e78935: Loading layer [==================================================>] 14.85kB/14.85kB d73dd9e65295: Loading layer [==================================================>] 5.632kB/5.632kB 2de391e51d73: Loading layer [==================================================>] 3.072kB/3.072kB 4605c0a3f29d: Loading layer [==================================================>] 633.4MB/633.4MB c8d909e84bbf: Loading layer [==================================================>] 1.682MB/1.682MB 7e66ff617b4c: Loading layer [==================================================>] 4.956MB/4.956MB db9d4b8567ab: Loading layer [==================================================>] 17.92kB/17.92kB 223fe4d67f77: Loading layer [==================================================>] 3.584kB/3.584kB 2e75b028b124: Loading layer [==================================================>] 43.04MB/43.04MB 1a7d923392f7: Loading layer [==================================================>] 2.56kB/2.56kB 2b9cc11f6cfc: Loading layer [==================================================>] 176.6kB/176.6kB Loaded image: greenplum-for-kubernetes:v0.0.1.dev.391.gaf0b6d5
Verify that the Docker image tag
greenplum-for-kubernetes:<version>
is now available:$ docker images | grep greenplum-for-kubernetes
greenplum-for-kubernetes v0.0.1.dev.391.gaf0b6d5 0a501efdde09 9 days ago 775MB
Deploy Greenplum containers to Minikube by executing the
deploy.sh
script found in thegreenplum
directory. Minikube deployments use the syntaxIS_DEV=true ./deploy.sh <data_segment_count> <helm_release_name>
where:IS_DEV=true
is an environment variable that automatically configures certain deployment properties that are required for Minikube. See Deployment Configuration Options for additional information.<data_segment_count>
is the number of logical data segments to create in the cluster, not including the master, standby master, or mirror segments. Each data segment that you create is mirrored by default. For example, if you deploy a cluster with the<data_segment_count>
as 1, the Greenplum cluster will include a master segment, a standby master, one primary segment, and one mirror segment.<helm_release_name>
is the helm release name to assign to the Greenplum deployment.
For example, the following commands deploy a new Greenplum cluster with one data segment to Minikube, using thehelm
release name “gpdb”:$ cd greenplum $ IS_DEV=true ./deploy.sh 1 gpdb
+++ dirname ./deploy.sh ++ cd . ++ pwd + SCRIPT_DIR=/Users/demouser/Downloads/greenplum-for-kubernetes/greenplum + source /Users/demouser/Downloads/greenplum-for-kubernetes/greenplum/wait_for_deployment.bash ++ SLEEP_SECS=1 ++ TIMEOUT_SECS=300 ++ NAMESPACE=default + SEGMENT_COUNT=1 + export SEGMENT_COUNT + CLUSTER_NAME=gpdb + [[ gpdb == *\_* ]] + [[ gpdb =~ [A-Z] ]] + true + IS_DEV=true + verify_node_count 1 + true + set +x ######################################################## ## Generating configuration for local-dev ## ######################################################## + env IS_DEV=true /Users/demouser/Downloads/greenplum-for-kubernetes/greenplum/generate_cluster_configuration.bash +++ dirname /Users/demouser/Downloads/greenplum-for-kubernetes/greenplum/generate_cluster_configuration.bash [...] NAME: gpdb LAST DEPLOYED: Thu Jul 19 13:14:20 2018 NAMESPACE: default STATUS: DEPLOYED RESOURCES: ==> v1/ServiceAccount NAME SECRETS AGE gpdb 1 1s ==> v1/Service NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE agent ClusterIP None <none> 22/TCP 1s greenplum LoadBalancer 10.100.126.245 <pending> 5432:30630/TCP 1s ==> v1/Pod NAME READY STATUS RESTARTS AGE master 0/1 ContainerCreating 0 1s segment-0a 0/1 ContainerCreating 0 0s segment-0b 0/1 ContainerCreating 0 0s standby 0/1 ContainerCreating 0 0s ==> v1/ConfigMap NAME DATA AGE greenplum-config 1 1s + true
When you executedeploy.sh
, the script performs several actions: - It creates the Greenplum Database master, standby, primary, and mirror segments each in a separate container in a separate pod. - It generates Persistent Volumes by default, and the volumes are destroyed when the cluster is deleted.Execute
kubectl get pods
to monitor the progress of the deployment:$ kubectl get pods
NAME READY STATUS RESTARTS AGE master 1/1 Running 0 2m segment-0a 1/1 Running 0 2m segment-0b 1/1 Running 0 2m standby 1/1 Running 0 2m
Each Greenplum segment (master, standby, primary, and mirror) is deployed in a single container on each pod. When the container on a pod is ready (indicated by the
1/1
entry in theREADY
column), then the pod itself shows the statusRunning
.Important: You must wait for all of the deployed pods to reach the
Running
status before you continue to the next step. If any pods are still starting, re-executekubectl get pods
at a later time to check the deployment status. The example above shows that all pods areRunning
, and you can continue with the next step.After all pods are running, initialize the Greenplum cluster using the command:
$ kubectl exec -it master /home/gpadmin/tools/wrap_initialize_cluster.bash
Key scanning started + DNS_SUFFIX=agent.default.svc.cluster.local ++ cat /etc/config/segmentCount + SEGMENT_COUNT=1 ++ hostname + current_master=master + current_standby=standby [...] 20180719:20:26:02:000131 gpinitsystem:master:gpadmin-[INFO]:-Greenplum Database instance successfully created 20180719:20:26:02:000131 gpinitsystem:master:gpadmin-[INFO]:------------------------------------------------------- 20180719:20:26:02:000131 gpinitsystem:master:gpadmin-[INFO]:-To complete the environment configuration, please 20180719:20:26:02:000131 gpinitsystem:master:gpadmin-[INFO]:-update gpadmin .bashrc file with the following 20180719:20:26:02:000131 gpinitsystem:master:gpadmin-[INFO]:-1. Ensure that the greenplum_path.sh file is sourced 20180719:20:26:02:000131 gpinitsystem:master:gpadmin-[INFO]:-2. Add "export MASTER_DATA_DIRECTORY=/greenplum/data-1" 20180719:20:26:02:000131 gpinitsystem:master:gpadmin-[INFO]:- to access the Greenplum scripts for this instance: 20180719:20:26:02:000131 gpinitsystem:master:gpadmin-[INFO]:- or, use -d /greenplum/data-1 option for the Greenplum scripts 20180719:20:26:02:000131 gpinitsystem:master:gpadmin-[INFO]:- Example gpstate -d /greenplum/data-1 20180719:20:26:02:000131 gpinitsystem:master:gpadmin-[INFO]:-Script log file = /home/gpadmin/gpAdminLogs/gpinitsystem_20180719.log 20180719:20:26:02:000131 gpinitsystem:master:gpadmin-[INFO]:-To remove instance, run gpdeletesystem utility 20180719:20:26:02:000131 gpinitsystem:master:gpadmin-[INFO]:-Standby Master standby has been configured 20180719:20:26:02:000131 gpinitsystem:master:gpadmin-[INFO]:-To activate the Standby Master Segment in the event of Master 20180719:20:26:02:000131 gpinitsystem:master:gpadmin-[INFO]:-failure review options for gpactivatestandby 20180719:20:26:02:000131 gpinitsystem:master:gpadmin-[INFO]:------------------------------------------------------- 20180719:20:26:02:000131 gpinitsystem:master:gpadmin-[INFO]:-The Master /greenplum/data-1/pg_hba.conf post gpinitsystem 20180719:20:26:02:000131 gpinitsystem:master:gpadmin-[INFO]:-has been configured to allow all hosts within this new 20180719:20:26:03:000131 gpinitsystem:master:gpadmin-[INFO]:-array to intercommunicate. Any hosts external to this 20180719:20:26:03:000131 gpinitsystem:master:gpadmin-[INFO]:-new array must be explicitly added to this file 20180719:20:26:03:000131 gpinitsystem:master:gpadmin-[INFO]:-Refer to the Greenplum Admin support guide which is 20180719:20:26:03:000131 gpinitsystem:master:gpadmin-[INFO]:-located in the /opt/gpdb/docs directory 20180719:20:26:03:000131 gpinitsystem:master:gpadmin-[INFO]:------------------------------------------------------- *******************************************
The script exchanges keys, creates a configuration file based on the number of segments specified, and initializes the Greenplum Database cluster.
To validate the deployment, access the Greenplum Database deployment using the
psql
command inside Minikube, and execute SQL to validate the cluster deployment. For example, start by runningpsql
in Minikube:$ kubectl exec -it master bash -- -c "source /opt/gpdb/greenplum_path.sh; psql"
After
psql
starts, you can execute Greenplum SQL statements. For example, query the segment configuration with:gpadmin=# select * from gp_segment_configuration;
dbid | content | role | preferred_role | mode | status | port | hostname | address | replication_port ------+---------+------+----------------+------+--------+-------+------------+------------+------------------ 1 | -1 | p | p | s | u | 5432 | master | master | 2 | 0 | p | p | s | u | 40000 | segment-0a | segment-0a | 6000 3 | 0 | m | m | s | u | 50000 | segment-0b | segment-0b | 6001 4 | -1 | m | m | s | u | 5432 | standby | standby | (4 rows)
To exit the
psql
utility (and the interactive shell in Kubernetes), enter\q
.To delete the Greenplum Cluster after you have finished using it, see Deleting the Greenplum Database Cluster.
To stop Minikube, execute the command:
$ minikube stop
Stopping local Kubernetes cluster... Machine stopped.
(Optional) Preparing Pre-Created Disks for PKS and GKE Deployments
Optionally, the Greenplum for Kubernetes service can be deployed using pre-created disks instead of persistent volumes. The workspace/samples/scripts/create_disks.bash
script contains an example for doing this on the Google Cloud Platform. It uses the syntax:
$ create_disks.bash <cluster_prefix> <segment_count>
The script creates disks that are sized at 1GB each.
Follow these instructions to deploy Greenplum for Kubernetes using pre-created disks:
Go to the
workspace
directory from the installation package. For example:$ cd greenplum-for-kubernetes-*/workspace
Execute
samples/scripts/create_disks.bash
using the cluster prefix and desired segment count for the deployment. For example:$ samples/scripts/create_disks.bash gpdb 1
When prompted, enter the number corresponding to the zone where you want to create each disk. After creating all disks, the script ends with output similar to:
Created [https://www.googleapis.com/compute/v1/projects/gp-kubernetes/zones/us-central1-a/disks/segment-0b-gpdb-disk]. NAME ZONE SIZE_GB TYPE STATUS segment-0b-gpdb-disk us-central1-a 20 pd-standard READY New disks are unformatted. You must format and mount a disk before it can be used. You can find instructions on how to do this at: https://cloud.google.com/compute/docs/disks/add-persistent-disk#formatting
Note that the Greenplum deployment script automatically formats the disks.
Create a file named
./Values-overrides.yml
in your workspace and add the following lines:usePreCreatedDisks: true clusterPrefix: <cluster_prefix>
Replace
<cluster_prefix>
with the prefix used in thecreate_disks.bash
command (for example, “gpdb”).Follow the procedure to deploy Greenplum on PKS or GKE, specifying the same number of segments. The
./Values-overrides.yml
file targets the deployment to use pre-created disks that were previously generated with the cluster prefix.
Deploying Greenplum to PKS
Follow this procedure to deploy Greenplum for Kubernetes to PKS.
Prerequisites
This procedure requires that PKS on GCP is installed and running, along with all prerequisite software and configuration. See Installing PKS for Greenplum for Kubernetes, Using Google Cloud Platform (GCP) for information about installing PKS.
Before you attempt to deploy Greenplum, ensure that the target cluster is available. Execute the following command make make sure that the target cluster displays in the output:
pks list-clusters
Note: The pks
login cookie typically expires after a day or two.
To use pre-created disks with PKS instead of (default) automatically-managed persistent volumes, follow the instructions in (Optional) Preparing Pre-Created Disks before continuing with the procedure.
Note: If any problems occur during deployment, retry deploying Greenplum by first remove the previous deployment.
Procedure
Download the Greenplum for Kubernetes software from Pivotal Network. The download file has the name:
greenplum-for-kubernetes-<version>.tar.gz
.Go to the directory where you downloaded Greenplum for Kubernetes, and unpack the downloaded software. For example:
$ cd ~/Downloads $ tar xzf greenplum-for-kubernetes-*.tar.gz
The above command unpacks the distribution into a new directory named
greenplum-for-kubernetes-<version>
.Go into the new
greenplum-for-kubernetes-<version>
directory:$ cd ./greenplum-for-kubernetes-*
Ensure that
helm
has sufficient privileges via a Kubernetes service account. Use a command like:$ kubectl create -f initialize_helm_rbac.yaml
serviceaccount "tiller" created clusterrolebinding.rbac.authorization.k8s.io "tiller" created
This sets the necessary privileges for
helm
with a service account namedtiller
.Initialize and upgrade
helm
with the command:$ helm init --wait --service-account tiller --upgrade
$HELM_HOME has been configured at /<path>/.helm. Tiller (the Helm server-side component) has been upgraded to the current version. Happy Helming!
Load the Greenplum for Kubernetes Docker image to the local Docker registry:
$ docker load -i ./images/greenplum-for-kubernetes
644879075e24: Loading layer [==================================================>] 117.9MB/117.9MB d7ff1dc646ba: Loading layer [==================================================>] 15.87kB/15.87kB 686245e78935: Loading layer [==================================================>] 14.85kB/14.85kB d73dd9e65295: Loading layer [==================================================>] 5.632kB/5.632kB 2de391e51d73: Loading layer [==================================================>] 3.072kB/3.072kB 4605c0a3f29d: Loading layer [==================================================>] 633.4MB/633.4MB c8d909e84bbf: Loading layer [==================================================>] 1.682MB/1.682MB 7e66ff617b4c: Loading layer [==================================================>] 4.956MB/4.956MB db9d4b8567ab: Loading layer [==================================================>] 17.92kB/17.92kB 223fe4d67f77: Loading layer [==================================================>] 3.584kB/3.584kB 2e75b028b124: Loading layer [==================================================>] 43.04MB/43.04MB 1a7d923392f7: Loading layer [==================================================>] 2.56kB/2.56kB 2b9cc11f6cfc: Loading layer [==================================================>] 176.6kB/176.6kB Loaded image: greenplum-for-kubernetes:v0.0.1.dev.391.gaf0b6d5
Verify that the Docker image tag
greenplum-for-kubernetes:<version>
is now available:$ docker images | grep greenplum-for-kubernetes
greenplum-for-kubernetes v0.0.1.dev.391.gaf0b6d5 0a501efdde09 9 days ago 775MB
Decide the ${IMAGE_REPO} of where to push the greenplum-for-kubernetes docker image.
As an example, to push the image to a Google Cloud Registry under the current Google Cloud project:
$ PROJECT=$(gcloud config list core/project --format='value(core.project)') $ IMAGE_REPO="gcr.io/${PROJECT}/greenplum-for-kubernetes" $ IMAGE_NAME="${IMAGE_REPO}:$(cat ./images/greenplum-for-kubernetes-tag)" $ docker tag $(cat ./images/greenplum-for-kubernetes-id) ${IMAGE_NAME} $ gcloud docker -- push ${IMAGE_NAME}
If you used a custom image repo (for example, one that includes your user name), then create a
greenplum/Values-overrides.yml
file and add the keyimageRepo
with the correct value of${IMAGE_REPO}
.Go to the project directory that contains the
deploy.sh
script. For example:$ cd ./greenplum
(Optional.) If necessary, override any default deployment options by creating or editing the
Values-overrides.yml
file. See Deployment Configuration Options.Execute the
deploy.sh
script, specifying the Greenplum Database segment count,helm
release name, and the path to the Greenplum for Kubernetes service account key:$ ./deploy.sh <segment_count> <helm_release_name> <service_account_key>
The
deploy.sh
script fails if the number of nodes in the cluster’s default-pool is not greater than or equal to2 * segment_count + 2
. The<service_account_key>
must specify the full path the key file required to download the container image from a repository.
For example, the following command deploys a set of Greenplum pods, withhelm
release named “foo”, creating 1 segment:$ ./deploy.sh 1 foo ./key.json
The above
deploy.sh
command fails if the number of nodes in the PKS cluster’s default-pool is not greater than or equal to 4 (2 * segment_count + 2
).
When you executedeploy.sh
, the script performs several actions:- It creates the Greenplum Database master, standby, primary, and mirror segments each in a separate container in a separate pod.
- It generates Persistent Volumes by default, and the volumes are destroyed when the cluster is deleted. You can optionally pre-create disks for usage across repeated cluster initialization operations. See (Optional) Preparing Pre-Created Disks for more information.
Look at the bottom of the
deploy.sh
script output to find the IP address and port available forpsql
access. That output line has port 31000 and a IP address in it. For example:################################ The Kubernetes cluster is almost ready for Greenplum initialization. Run 'watch kubectl get pods' until all containers are running, then run: kubectl exec master /home/gpadmin/tools/wrap_initialize_cluster.bash ################################ When that is finished, you should be able to connect from your desktop via a command like: psql -U gpadmin -p 5432 -h 104.198.135.115
Record this IP address. Later instructions refer to this IP as
PSQL_SERVICE_IP
.Use
kubectl
to monitor the progress of the deployment, and wait until all of the containers reach theRunning
state, as shown below:$ kubectl get pods
NAME READY STATUS RESTARTS AGE master 1/1 Running 0 2m segment-0a 1/1 Running 0 2m segment-0b 1/1 Running 0 2m standby 1/1 Running 0 2m
After all pods are running, initialize the Greenplum cluster using the command:
$ kubectl exec -it master /home/gpadmin/tools/wrap_initialize_cluster.bash
The script scans host keys, creates a configuration file based on the number of segments specified, and initializes the Greenplum Database cluster.
To validate the deployment, access the Greenplum Database deployment on the current development machine using the
psql
command.$ psql -U gpadmin -p 31000 -h ${PSQL_SERVICE_IP}
In the above command,
PSQL_SERVICE_IP
should correspond to the IP address displayed in thedeploy.sh
script output in step 12.See Deleting the Greenplum Database Cluster for information about deleting the Greenplum cluster after you have finished using it.
Deploying Greenplum to Google Kubernetes Engine
Follow this procedure to deploy Greenplum for Kubernetes to Google Kubernetes Engine (GKE) on Google Cloud Platform.
Prerequisites
When creating the Kubernetes cluster, ensure that you make the following selections on the Create a Kubernetes cluster screen of the Google Cloud Platform console:
- For the Cluster Version option, select the most recent version of Kubernetes.
- Scale the Machine Type option to at least 2 vCPUs / 7.5 GB memory.
- For the Node Image option, you must select Ubuntu. You cannot deploy Greenplum with the Container-Optimized OS (cos) image.
- Set the Size to 4 or more nodes.
- Set Automatic node repair to Disabled (the default).
- In the Advanced Options (click More to display Advanced Options), select Enable Kubernetes alpha features in this cluster. Also select I understand the consequences to confirm the choice.
Use the gcloud
utility to login to GCP, and to set your current project and cluster context:
Log into GCP:
$ gcloud auth login
Set the current project to the project where you will deploy Greenplum:
$ gcloud config set project <project-name>
Set the context to the Kubernetes cluster that you created for Greenplum:
- Access GCP Console.
- Select Kubernetes Engine > Clusters.
- Click Connect next to the cluster that you configured for Greenplum, and copy the connection command.
On your local client machine, paste the command to set the context to your cluster. For example:
$ gcloud container clusters get-credentials <username> --zone us-central1-a --project <my-project>
Fetching cluster endpoint and auth data. kubeconfig entry generated for <username>.
Procedure
The procedure for deploying Greenplum using the deployment script is the same for both PKS and GKE clusters. Follow the numbered instructions in the PKS Procedure.
Deployment Configuration Options
Some options that affect deployment of Greenplum are available by editing a configuration file that is
used by the deploy.sh
script.
The configuration defaults are supplied in file greenplum/Values-common.yml
, and these settings can be overridden
by entries in greenplum/Values-overrides.yml
The available configurations, and their default values include:
imageTag: latest # determines the tag for the image used by Greenplum containers
# POD specific
memoryLimitRange: 4.5Gi
cpuLimitRange: 1.2
useAffinity: true
# Disk specific
usePreCreatedDisks: false
clusterPrefix: dev # only used when usePreCreatedDisks is true
usePersistentDisk: true
useGCE: true # only used when usePersistentDisk is true
useVsphere: false # only used when usePersistentDisk is true
useLocal: false # only used when usePersistentDisk is true
localDir: /gpdata # only used when useLocal is true
# GPDB specific
trustedCIDR: "216.253.193.0/24" # PALO ALTO OFFICE CIDR
masterDataDirectory: /greenplum/data-1
Setting IS_DEV=true
in your environment before running deploy.sh
deploys the Greenplum cluster with certain configuration settings that are required for Minikube:
usePersistentDisk: false
useAffinity: false
The default imagePullPolicy
is IfNotPresent
, which uses the local Docker image that you uploaded to Minikube.
Below is a description of all available settings, grouped to provide context.
Pod Resource Settings
memoryLimitRange: 4.5Gi
cpuLimitRange: 1.2
useAffinity: true
Users can specify how much CPU and Memory resources are used by each pod.
memoryLimitRange
specifies the memory requirement for single Greenplum instance.
cpuLimitRange
specifies the CPU requirement for single Greenplum instance.
Note: a pod cannot be scheduled unless these resources are available on a node.
useAffinity
has to do with Greenplum high availability (HA). In production, HA is important, but in
a temporary proof-of-concept deployment, a user may want to turn off HA requirements in order to run
with fewer resources
For HA operation, Greenplum insists that any mirror database must run on a different host than its primary database. Translating this into Kubernetes means that a node running a given segment’s primary database cannot also host its mirror database. (And this implies that the node running the mirror cannot run on the same physical server as the primary’s node.)
One part of this requirement is achieved by the logic implied by setting useAffinity
to true
.
When useAffinity
is true
, none of the pods running Greenplum instances can be scheduled on the same
Kubernetes node. In other words, there will be only ONE pod running Greenplum instance on a given Kubernetes node.
Storage Volume Settings
There are different performance manageability and durability trade-offs when considering where to store Greenplum data.
Below, these trade-offs are discussed using a rating scale of Best, Better, Good, Zero
for four separate storage choices.
Four key criteria are repeated below for each of the four storage choices:
- performance: largely equivalent to storage speed
- manageability: how much labor an administrator must spend to maintain the storage
- durability: what are the failure cases for losing data
- high availability (HA): whether mirrors are necessary to achieve HA
Example: Ephemeral Volume (development mode, not for production)
usePersistentDisk: false
The disk life cycle is linked to the pod life cycle.
- Best Performance: local IO, no network delay
- Best Manageability: automatic management–storage goes away when pod is destroyed
- Zero Durability: pod failures lose data
- HA: mirroring is required
Example: Local Persistent Volume
usePersistentDisk: true
useLocal: true
localDir: /gpdata
The disk life cycle is linked to the Kubernetes node life cycle. The data will be lost if the node fails.
- Best Performance: local IO, no network delay
- Good Manageability: administrators must manually clean up
localDir
on a Kubernetes node. A pod with the same name on the same node can reacquire the stored data. Moving a pod across nodes could lose data. - Good Durability: sustains pod failures; node failures lose data
- HA: mirroring is required
Example: Remote Persistent Volumes, Dynamically Created by the IaaS
(There are two supported platforms for IaaS choice, Google Compute Engine (GCE) and VMWare vSphere (Vsphere))
usePersistentDisk: true
useGCE: true
OR
usePersistentDisk: true
useVsphere: true
The disk life cycle is managed by the IaaS. As long as the cluster is not deleted, the data remains. The volume can be easily remounted wherever the pod goes, on any Kubernetes node.
- Good Performance: through network bandwidth
- Best Manageability: automatic management by IaaS
- Best Durability: sustains container, pod, node failures; Kubernetes cluster deletion will lose data
- HA: mirroring not required (assuming reliance on IaaS storage for HA)
Example: Pre-Created, Remote Persistent Volumes
usePreCreatedDisks: true
clusterPrefix: dev
The disk life cycle is managed by the IaaS, and even if the cluster is deleted, the data remains. The volume can be easily remounted wherever the pod goes, on any Kubernetes node.
Limitation: this feature is currently only available on GCE.
- Good Performance: through network bandwidth
- Best Manageability: automatic management by IaaS
- Best Durability: sustains container, pod, node, cluster failures
- HA: mirroring not required (assuming reliance on IaaS storage for HA)
Greenplum Settings
Below are settings for security and configuration purposes.
Master Data Directory
masterDataDirectory: /greenplum/data-1
masterDataDirectory
specifies the path within which Greenplum will write data.
Example: Custom IP Range (CIDR) for Trusted Client Connections
Greenplum Database is based on PostgreSQL, which uses entries in the pg_hba.conf
file to determine whether to trust a given client connection.
Part of the Greenplum for Kubernetes configuration establishes the “trusted CIDR” for access. To change this value, create a file named ./Values-overrides.yml
in the gp-kubernetes/greenplum
directory (if one does not already exist). Then add the line:
trustedCidr: "123.123.123.0/24"
Replace the actual IP address range to match the local deployment requirements.
Expanding the Greenplum Deployment on PKS
To expand the Greenplum cluster, you first create new pods in the Kubernetes cluster, and then run the expand_cluster.bash
script to put Greenplum containers on the new nodes. You can then use the standard Greenplum Database gpexpand
command in the cluster to initialize the new segments and redistribute data. Follow these steps:
Use
pks resize
to resize the cluster to the new total number of nodes that are required. The cluster must have two nodes for each Greenplum Database segment (to accommodate a primary and mirror database for each segment) and an additional two nodes for the master and standby master. Runpks resize
with the options:pks resize --wait --num-nodes <new_total_number_of_nodes>
In the above command,
<new_total_number_of_nodes>
can be calculated by the formula:= new_total_number_of_segments * 2 + 2 Note: This command may take a considerable amount of time (20 minutes or longer) because procuring new nodes is a time-consuming process.
After the
pks resize
command completes, use theexpand_cluster.bash
script to put Greenplum containers onto the nodes you provided. Enter:expand_cluster.bash <greenplum_cluster_name> <new_total_number_of_segments>
where
<greenplum_cluster_name>
is the name of the existing Greenplum cluster and<new_total_number_of_segments>
is the total number of segments the cluster will have after expansion is complete. If you do not remember the original<greenplum_cluster_name>
you can display it by using thehelm list
command.After the Greenplum containers are placed on the new nodes, you can use the Greenplum Database
gpexpand
tool to initialize the new segments. The input file togpexpand
requires details about the new segments that you are adding to the Greenplum Database cluster. (See Initializing New Segments in the Greenplum Database documentation for additional information.) When using these instructions, keep in mind the following information and conventions that are used in the Greenplum for Kubernetes deployment environment:
Segment Host Names and Port Numbers
All Greenplum for Kubernetes segment host names follow the pattern:segment-<number>[a|b]
where
<number>
is the number of the segment, starting from 0. The letter “a” or “b” indicates a primary or mirror segment, respectively. For example, a cluster with 2 segments has the host names:- segment-0a (the primary for data segment 0)
- segment-0b (the mirror for data segment 0)
- segment-1a (the primary for data segment 1)
- segment-1b (the mirror for data segment 1)
Primary and mirror segment hosts use the port configuration:
Port Number primary port 40000 primary replication port 6000 mirror port 50000 mirror replication port 6001
Data Directories
Each segment host uses the same data directory naming convention:Data Directory Path primary data directory /greenplum/data mirror data directory /greenplum/mirror/data
Database ID and Content ID
Each segment database requires a unique Database ID. Greenplum for Kubernetes standardizes this database ID to be 1 for the master, and increments the value for each primary and mirror segment database. For example, the primary and mirror will each have a unique, incremented ID. In order to expand the cluster, you will need to provide the new Database ID values for the new segments using this convention.
Each Greenplum segment has a single Content ID that is shared by both the primary and the mirror database for that segment. Greenplum for Kubernetes standardizes the Contend ID to be -1 for master, and ends at<new_total_number_of_segments> - 1
where<new_total_number_of_segments>
is total number of new segments that you are adding to the existing Greenplum cluster. In order to expand the cluster, you will need to provide the new Content ID values for the new segments using this convention.
Example: Programmatically creating agpexpand
initialization file
You can usekubectl
commands to programmatically generate the contents of agpexpand
initialization file. This series of commands takes as input the number of new segments you are adding to the Greenplum cluster (as the environment variablenew_total_number_of_segments
), and uses that value to create entries in/tmp/expand_detail_file
that use thegpexpand
initialization file format:hostname:address:port:fselocation:dbid:content:preferred_role:replication_port
Follow this procedure to use the example commands:
On your client machine, set the environment variable,
new_total_number_of_segments
, to the new total number of segments in your expanded Greenplum cluster. For example:export new_total_number_of_segments=4
Copy and paste the following commands into the terminal where you set the environment variable:
set -u echo "expanding to a total of ${new_total_number_of_segments}" last_dbid=$(kubectl exec master -- bash -l -c "psql --tuples-only -c 'select max(dbid) from gp_segment_configuration'") last_contentid=$(kubectl exec master -- bash -l -c "psql --tuples-only -c 'select max (content) from gp_segment_configuration'") dbid=$((last_dbid+1)) for i in $(seq $((last_contentid+1)) $((new_total_number_of_segments-1))) do echo "segment-${i}a:segment-${i}a:40000:/greenplum/data:${dbid}:${i}:p:6000" >> /tmp/expand_detail_file dbid=$((dbid+1)) echo "segment-${i}b:segment-${i}b:50000:/greenplum/mirror/data:${dbid}:${i}:m:6001" >> /tmp/expand_detail_file dbid=$((dbid+1)) done
These commands use information obtained from the Greenplum cluster to populate a
gpexpand
initialization file in/tmp/expand_detail_file
. Examine the contents of the file to verify that it describes the new segment(s) you are adding.Copy the generated file from your client into the Greenplum cluster on PKS:
kubectl cp /tmp/expand_detail_file master:/tmp/expand_detail_file
Use the initialization file with
gpexpand
in the PKS cluster to initialize the new segments:kubectl exec master -- bash -l -c "gpexpand -i /tmp/expand_detail_file -D gpadmin"
After the
gpexpand
utility exits, execute a query to verify the new segment configuration of your cluster:kubectl exec master -- bash -l -c "psql -c 'select * from gp_segment_configuration'"