Deploying Pivotal Greenplum for Kubernetes

Overview

Pivotal Greenplum for Kubernetes is an experimental product and is not intended for use in a production environment. Experimental features are subject to change without notice in future releases.

The Greenplum for Kubernetes distribution provides a Docker image with the open source Greenplum Database software on Ubuntu, for deployment to Kubernetes clusters. Greenplum for Kubernetes also provides management scripts to help you configure, deploy, and expand a Greenplum cluster in Kubernetes.

Pivotal Greenplum for Kubernetes currently uses a script-based method to deploy Greenplum Database to Kubernetes. A Greenplum Database Operator with which you can configure, deploy, and manage the Greenplum cluster will be available in a future release. See the preview documentation for more information.

This topic describes how to deploy a new Greenplum cluster to Kubernetes, and how to expand an existing Greenplum deployment in Kubernetes. You can use one of the following environments to run Greenplum for Kubernetes:

Deploying Greenplum to Minikube

Follow this procedure to deploy Greenplum for Kubernetes to Minikube, to demonstrate Greenplum on your local system.

Prerequisites

To deploy Greenplum for Kubernetes on Minikube, you require:

  • kubectl command-line utility. Install the version of kubectl that is distributed with PKS, even if you are deploying Greenplum to Minikube. See Installing the Kuberenetes CLI in the PKS documentation for instructions.

  • Docker. Install a recent version of Docker to your machine, and start Docker.

  • Minikube. See the Minikube documentation to install the latest version of Minikube. As part of the Minikube installation, install a compatible hypervisor to your system if one is not already available.


    Note: Do not install or re-install kubectl as part of the Minikube installation.

  • Helm package manager utility. Follow the instructions at Kubernetes Helm to install the latest version of helm.

In addition to the above, the Greenplum for Kubernetes deployment process requires the ability to map the host system’s /sys/fs/cgroup directory onto each container’s /sys/fs/cgroup. Ensure that no kernel security module (for example, AppArmor) uses a profile that disallows mounting /sys/fs/cgroup.

To validate that your system meets these prerequisites, ensure that the following commands execute without any errors, and that the output versions are similar to the ones shown:

$ kubectl version --client
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.4", GitCommit:"5ca598b4ba5abb89bb773071ce452e33fb66339d", GitTreeState:"clean", BuildDate:"2018-06-06T08:13:03Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"darwin/amd64"}
$ docker --version
Docker version 18.03.1-ce, build 9ee9f40
$ minikube version
minikube version: v0.28.1
$ helm version --client
Client: &version.Version{SemVer:"v2.9.1", GitCommit:"20adb27c7c5868466912eebdf6664e7390ebe710", GitTreeState:"clean"}

Note: The documented procedure builds the required Docker image locally and uploads it to Minikube. As an alternative you can use Docker support for Kubernetes.

Procedure

  1. Start Docker if it is not already running on your system.

  2. Start the Minikube cluster using resources to ensure reasonable response times for the Greenplum service. For example:

    $ minikube start --memory 4096 --cpus 4
    
    Starting local Kubernetes v1.10.0 cluster...
    Starting VM...
    Downloading Minikube ISO
     160.27 MB / 160.27 MB [============================================] 100.00% 0s
    Getting VM IP address...
    Moving files into cluster...
    Downloading kubeadm v1.10.0
    Downloading kubelet v1.10.0
    Finished Downloading kubeadm v1.10.0
    Finished Downloading kubelet v1.10.0
    Setting up certs...
    Connecting to cluster...
    Setting up kubeconfig...
    Starting cluster components...
    Kubectl is now configured to use the cluster.
    Loading cached images from config file.
    
  3. Confirm that the kubectl utility can access Minikube:

    $ kubectl get node
    
    NAME       STATUS    ROLES     AGE       VERSION
    minikube   Ready     master    2m        v1.10.0
    

    Note: If you have problems starting or connecting to Minikube, use minikube delete to remove the current Minikube and then recreate it.

  4. Download the Greenplum for Kubernetes software from Pivotal Network. The download file has the name: greenplum-for-kubernetes-<version>.tar.gz.

  5. Go to the directory where you downloaded Greenplum for Kubernetes, and unpack the downloaded software. For example:

    $ cd ~/Downloads
    $ tar xzf greenplum-for-kubernetes-*.tar.gz
    

    The above command unpacks the distribution into a new directory named greenplum-for-kubernetes-<version>.

  6. Go into the new greenplum-for-kubernetes-<version> directory:

    $ cd ./greenplum-for-kubernetes-*
    
  7. Change the local docker environment to point to the Minikube docker, so that the local docker daemon interacts with images inside the Minikube docker environment:

    $ eval $(minikube docker-env)
    

    Note: To undo this docker setting in the current shell, run eval "$(docker-machine env -u)".

  8. Load the Greenplum for Kubernetes Docker image to the local Docker registry:

    $ docker load -i ./images/greenplum-for-kubernetes
    
    644879075e24: Loading layer [==================================================>]  117.9MB/117.9MB
    d7ff1dc646ba: Loading layer [==================================================>]  15.87kB/15.87kB
    686245e78935: Loading layer [==================================================>]  14.85kB/14.85kB
    d73dd9e65295: Loading layer [==================================================>]  5.632kB/5.632kB
    2de391e51d73: Loading layer [==================================================>]  3.072kB/3.072kB
    4605c0a3f29d: Loading layer [==================================================>]  633.4MB/633.4MB
    c8d909e84bbf: Loading layer [==================================================>]  1.682MB/1.682MB
    7e66ff617b4c: Loading layer [==================================================>]  4.956MB/4.956MB
    db9d4b8567ab: Loading layer [==================================================>]  17.92kB/17.92kB
    223fe4d67f77: Loading layer [==================================================>]  3.584kB/3.584kB
    2e75b028b124: Loading layer [==================================================>]  43.04MB/43.04MB
    1a7d923392f7: Loading layer [==================================================>]   2.56kB/2.56kB
    2b9cc11f6cfc: Loading layer [==================================================>]  176.6kB/176.6kB
    Loaded image: greenplum-for-kubernetes:v0.0.1.dev.391.gaf0b6d5
    
  9. Verify that the Docker image tag greenplum-for-kubernetes:<version> is now available:

    $ docker images | grep greenplum-for-kubernetes
    
    greenplum-for-kubernetes                   v0.0.1.dev.391.gaf0b6d5   0a501efdde09        9 days ago          775MB
    
  10. Deploy Greenplum containers to Minikube by executing the deploy.sh script found in the greenplum directory. Minikube deployments use the syntax IS_DEV=true ./deploy.sh <data_segment_count> <helm_release_name> where:

    • IS_DEV=true is an environment variable that automatically configures certain deployment properties that are required for Minikube. See Deployment Configuration Options for additional information.
      Note: When IS_DEV=true, the Greenplum cluster is configured to accept connections from all IP addresses (0.0.0.0/0).
    • <data_segment_count> is the number of logical data segments to create in the cluster, not including the master, standby master, or mirror segments. Each data segment that you create is mirrored by default. For example, if you deploy a cluster with the <data_segment_count> as 1, the Greenplum cluster will include a master segment, a standby master, one primary segment, and one mirror segment.
    • <helm_release_name> is the helm release name to assign to the Greenplum deployment.


    For example, the following commands deploy a new Greenplum cluster with one data segment to Minikube, using the helm release name “gpdb”:

    $ cd greenplum
    $ IS_DEV=true ./deploy.sh 1 gpdb
    
    +++ dirname ./deploy.sh
    ++ cd .
    ++ pwd
    + SCRIPT_DIR=/Users/demouser/Downloads/greenplum-for-kubernetes/greenplum
    + source /Users/demouser/Downloads/greenplum-for-kubernetes/greenplum/wait_for_deployment.bash
    ++ SLEEP_SECS=1
    ++ TIMEOUT_SECS=300
    ++ NAMESPACE=default
    + SEGMENT_COUNT=1
    + export SEGMENT_COUNT
    + CLUSTER_NAME=gpdb
    + [[ gpdb == *\_* ]]
    + [[ gpdb =~ [A-Z] ]]
    + true
    + IS_DEV=true
    + verify_node_count 1
    + true
    + set +x
    ########################################################
    ##   Generating configuration for local-dev           ##
    ########################################################
    + env IS_DEV=true /Users/demouser/Downloads/greenplum-for-kubernetes/greenplum/generate_cluster_configuration.bash
    +++ dirname /Users/demouser/Downloads/greenplum-for-kubernetes/greenplum/generate_cluster_configuration.bash
    
    [...]
    
    NAME:   gpdb
    LAST DEPLOYED: Thu Jul 19 13:14:20 2018
    NAMESPACE: default
    STATUS: DEPLOYED
    
    RESOURCES:
    ==> v1/ServiceAccount
    NAME  SECRETS  AGE
    gpdb  1        1s
    
    ==> v1/Service
    NAME       TYPE          CLUSTER-IP      EXTERNAL-IP  PORT(S)         AGE
    agent      ClusterIP     None            <none>       22/TCP          1s
    greenplum  LoadBalancer  10.100.126.245  <pending>    5432:30630/TCP  1s
    
    ==> v1/Pod
    NAME        READY  STATUS             RESTARTS  AGE
    master      0/1    ContainerCreating  0         1s
    segment-0a  0/1    ContainerCreating  0         0s
    segment-0b  0/1    ContainerCreating  0         0s
    standby     0/1    ContainerCreating  0         0s
    
    ==> v1/ConfigMap
    NAME              DATA  AGE
    greenplum-config  1     1s
    
    + true
    


    When you execute deploy.sh, the script performs several actions: - It creates the Greenplum Database master, standby, primary, and mirror segments each in a separate container in a separate pod. - It generates Persistent Volumes by default, and the volumes are destroyed when the cluster is deleted.

  11. Execute kubectl get pods to monitor the progress of the deployment:

    $ kubectl get pods
    
    NAME                                  READY     STATUS    RESTARTS   AGE
    master                                1/1       Running   0          2m
    segment-0a                            1/1       Running   0          2m
    segment-0b                            1/1       Running   0          2m
    standby                               1/1       Running   0          2m
    

    Each Greenplum segment (master, standby, primary, and mirror) is deployed in a single container on each pod. When the container on a pod is ready (indicated by the 1/1 entry in the READY column), then the pod itself shows the status Running.

    Important: You must wait for all of the deployed pods to reach the Running status before you continue to the next step. If any pods are still starting, re-execute kubectl get pods at a later time to check the deployment status. The example above shows that all pods are Running, and you can continue with the next step.

  12. After all pods are running, initialize the Greenplum cluster using the command:

    $ kubectl exec -it master /home/gpadmin/tools/wrap_initialize_cluster.bash
    
    Key scanning started
    + DNS_SUFFIX=agent.default.svc.cluster.local
    ++ cat /etc/config/segmentCount
    + SEGMENT_COUNT=1
    ++ hostname
    + current_master=master
    + current_standby=standby.agent.default.svc.cluster.local
    
    [...]
    
    20180719:20:26:02:000131 gpinitsystem:master:gpadmin-[INFO]:-Greenplum Database instance successfully created
    20180719:20:26:02:000131 gpinitsystem:master:gpadmin-[INFO]:-------------------------------------------------------
    20180719:20:26:02:000131 gpinitsystem:master:gpadmin-[INFO]:-To complete the environment configuration, please
    20180719:20:26:02:000131 gpinitsystem:master:gpadmin-[INFO]:-update gpadmin .bashrc file with the following
    20180719:20:26:02:000131 gpinitsystem:master:gpadmin-[INFO]:-1. Ensure that the greenplum_path.sh file is sourced
    20180719:20:26:02:000131 gpinitsystem:master:gpadmin-[INFO]:-2. Add "export MASTER_DATA_DIRECTORY=/greenplum/data-1"
    20180719:20:26:02:000131 gpinitsystem:master:gpadmin-[INFO]:-   to access the Greenplum scripts for this instance:
    20180719:20:26:02:000131 gpinitsystem:master:gpadmin-[INFO]:-   or, use -d /greenplum/data-1 option for the Greenplum scripts
    20180719:20:26:02:000131 gpinitsystem:master:gpadmin-[INFO]:-   Example gpstate -d /greenplum/data-1
    20180719:20:26:02:000131 gpinitsystem:master:gpadmin-[INFO]:-Script log file =  /home/gpadmin/gpAdminLogs/gpinitsystem_20180719.log
    20180719:20:26:02:000131 gpinitsystem:master:gpadmin-[INFO]:-To remove instance, run gpdeletesystem utility
    20180719:20:26:02:000131 gpinitsystem:master:gpadmin-[INFO]:-Standby Master standby has been configured
    20180719:20:26:02:000131 gpinitsystem:master:gpadmin-[INFO]:-To activate the Standby Master Segment in the event of Master
    20180719:20:26:02:000131 gpinitsystem:master:gpadmin-[INFO]:-failure review options for gpactivatestandby
    20180719:20:26:02:000131 gpinitsystem:master:gpadmin-[INFO]:-------------------------------------------------------
    20180719:20:26:02:000131 gpinitsystem:master:gpadmin-[INFO]:-The Master /greenplum/data-1/pg_hba.conf post gpinitsystem
    20180719:20:26:02:000131 gpinitsystem:master:gpadmin-[INFO]:-has been configured to allow all hosts within this new
    20180719:20:26:03:000131 gpinitsystem:master:gpadmin-[INFO]:-array to intercommunicate. Any hosts external to this
    20180719:20:26:03:000131 gpinitsystem:master:gpadmin-[INFO]:-new array must be explicitly added to this file
    20180719:20:26:03:000131 gpinitsystem:master:gpadmin-[INFO]:-Refer to the Greenplum Admin support guide which is
    20180719:20:26:03:000131 gpinitsystem:master:gpadmin-[INFO]:-located in the /opt/gpdb/docs directory
    20180719:20:26:03:000131 gpinitsystem:master:gpadmin-[INFO]:-------------------------------------------------------
    *******************************************
    

    The script exchanges keys, creates a configuration file based on the number of segments specified, and initializes the Greenplum Database cluster.

  13. To validate the deployment, access the Greenplum Database deployment using the psql command inside Minikube, and execute SQL to validate the cluster deployment. For example, start by running psql in Minikube:

    $ kubectl exec -it master bash -- -c "source /opt/gpdb/greenplum_path.sh; psql"
    

    After psql starts, you can execute Greenplum SQL statements. For example, query the segment configuration with:

    gpadmin=# select * from gp_segment_configuration;
    
    
     | dbid | content | role | preferred_role | mode | status | port  | hostname                                | address                                    | replication_port |
     |------|---------|------|----------------|------|--------|-------|-----------------------------------------|--------------------------------------------|------------------|
     | 1    | -1      | p    | p              | s    | u      | 5432  | master                                  | master.agent.default.svc.cluster.local     |                  |
     | 2    | 0       | p    | p              | s    | u      | 40000 | segment-0a                              | segment-0a.agent.default.svc.cluster.local | 6000             |
     | 3    | 0       | m    | m              | s    | u      | 50000 | segment-0b                              | segment-0b.agent.default.svc.cluster.local | 6001             |
     | 4    | -1      | m    | m              | s    | u      | 5432  | standby.agent.default.svc.cluster.local | standby.agent.default.svc.cluster.local    |                  |
    
     (4 rows)
    
  14. To exit the psql utility (and the interactive shell in Kubernetes), enter \q.

  15. To delete the Greenplum Cluster after you have finished using it, see Deleting the Greenplum Database Cluster.

  16. To stop Minikube, execute the command:

    $ minikube stop
    
    Stopping local Kubernetes cluster...
    Machine stopped.
    

(Optional) Preparing Pre-Created Disks for PKS and GKE Deployments

Optionally, the Greenplum for Kubernetes service can be deployed using pre-created disks instead of persistent volumes. The workspace/samples/scripts/create_disks.bash script contains an example for doing this on the Google Cloud Platform. It uses the syntax:

$ create_disks.bash <cluster_prefix> <segment_count>

The script creates disks that are sized at 1GB each.

Follow these instructions to deploy Greenplum for Kubernetes using pre-created disks:

  1. Go to the workspace directory from the installation package. For example:

    $ cd greenplum-for-kubernetes-*/workspace
    
  2. Execute samples/scripts/create_disks.bash using the cluster prefix and desired segment count for the deployment. For example:

    $ samples/scripts/create_disks.bash gpdb 1
    
  3. When prompted, enter the number corresponding to the zone where you want to create each disk. After creating all disks, the script ends with output similar to:

    Created [https://www.googleapis.com/compute/v1/projects/gp-kubernetes/zones/us-central1-a/disks/segment-0b-gpdb-disk].
    NAME                   ZONE           SIZE_GB  TYPE         STATUS
    segment-0b-gpdb-disk  us-central1-a  20       pd-standard  READY
    
    New disks are unformatted. You must format and mount a disk before it
    can be used. You can find instructions on how to do this at:
    
    https://cloud.google.com/compute/docs/disks/add-persistent-disk#formatting
    

    Note that the Greenplum deployment script automatically formats the disks.

  4. Create a file named ./Values-overrides.yml in your workspace and add the following lines:

    usePreCreatedDisks: true
    clusterPrefix: <cluster_prefix>
    

    Replace <cluster_prefix> with the prefix used in the create_disks.bash command (for example, “gpdb”).

  5. Follow the procedure to deploy Greenplum on PKS or GKE, specifying the same number of segments. The ./Values-overrides.yml file targets the deployment to use pre-created disks that were previously generated with the cluster prefix.

Deploying Greenplum to PKS

Follow this procedure to deploy Greenplum for Kubernetes to PKS.

Prerequisites

This procedure requires that PKS on GCP is installed and running, along with all prerequisite software and configuration. See Installing PKS for Greenplum for Kubernetes, Using Google Cloud Platform (GCP) for information about installing PKS.

Obtain the Kubernetes service account key (key.json) file, and identify its location in the Values-common.yaml file as the value of dockerRegistryKeyJson. See Deployment Configuration Options.

Before you attempt to deploy Greenplum, ensure that the target cluster is available. Execute the following command make make sure that the target cluster displays in the output:

pks list-clusters

Note: The pks login cookie typically expires after a day or two.

The Greenplum for Kubernetes deployment process requires the ability to map the host system’s /sys/fs/cgroup directory onto each container’s /sys/fs/cgroup. Ensure that no kernel security module (for example, AppArmor) uses a profile that disallows mounting /sys/fs/cgroup.

To use pre-created disks with PKS instead of (default) automatically-managed persistent volumes, follow the instructions in (Optional) Preparing Pre-Created Disks before continuing with the procedure.

Note: If any problems occur during deployment, retry deploying Greenplum by first remove the previous deployment.

Procedure

  1. Download the Greenplum for Kubernetes software from Pivotal Network. The download file has the name: greenplum-for-kubernetes-<version>.tar.gz.

  2. Go to the directory where you downloaded Greenplum for Kubernetes, and unpack the downloaded software. For example:

    $ cd ~/Downloads
    $ tar xzf greenplum-for-kubernetes-*.tar.gz
    

    The above command unpacks the distribution into a new directory named greenplum-for-kubernetes-<version>.

  3. Go into the new greenplum-for-kubernetes-<version> directory:

    $ cd ./greenplum-for-kubernetes-*
    
  4. Ensure that helm has sufficient privileges via a Kubernetes service account. Use a command like:

    $ kubectl create -f initialize_helm_rbac.yaml
    
    serviceaccount "tiller" created
    clusterrolebinding.rbac.authorization.k8s.io "tiller" created
    

    This sets the necessary privileges for helm with a service account named tiller.

  5. Initialize and upgrade helm with the command:

    $ helm init --wait --service-account tiller --upgrade
    
    $HELM_HOME has been configured at /<path>/.helm.
    
    Tiller (the Helm server-side component) has been upgraded to the current version.
    Happy Helming!
    
  6. Load the Greenplum for Kubernetes Docker image to the local Docker registry:

    $ docker load -i ./images/greenplum-for-kubernetes
    
    644879075e24: Loading layer [==================================================>]  117.9MB/117.9MB
    d7ff1dc646ba: Loading layer [==================================================>]  15.87kB/15.87kB
    686245e78935: Loading layer [==================================================>]  14.85kB/14.85kB
    d73dd9e65295: Loading layer [==================================================>]  5.632kB/5.632kB
    2de391e51d73: Loading layer [==================================================>]  3.072kB/3.072kB
    4605c0a3f29d: Loading layer [==================================================>]  633.4MB/633.4MB
    c8d909e84bbf: Loading layer [==================================================>]  1.682MB/1.682MB
    7e66ff617b4c: Loading layer [==================================================>]  4.956MB/4.956MB
    db9d4b8567ab: Loading layer [==================================================>]  17.92kB/17.92kB
    223fe4d67f77: Loading layer [==================================================>]  3.584kB/3.584kB
    2e75b028b124: Loading layer [==================================================>]  43.04MB/43.04MB
    1a7d923392f7: Loading layer [==================================================>]   2.56kB/2.56kB
    2b9cc11f6cfc: Loading layer [==================================================>]  176.6kB/176.6kB
    Loaded image: greenplum-for-kubernetes:v0.0.1.dev.391.gaf0b6d5
    
  7. Verify that the Docker image tag greenplum-for-kubernetes:<version> is now available:

    $ docker images | grep greenplum-for-kubernetes
    
    greenplum-for-kubernetes                   v0.0.1.dev.391.gaf0b6d5   0a501efdde09        9 days ago          775MB
    
  8. Decide the ${IMAGE_REPO} of where to push the greenplum-for-kubernetes docker image.

    As an example, to push the image to a Google Cloud Registry under the current Google Cloud project:

    $ PROJECT=$(gcloud config list core/project --format='value(core.project)')
    $ IMAGE_REPO="gcr.io/${PROJECT}/greenplum-for-kubernetes"
    $ IMAGE_NAME="${IMAGE_REPO}:$(cat ./images/greenplum-for-kubernetes-tag)"
    $ docker tag $(cat ./images/greenplum-for-kubernetes-id) ${IMAGE_NAME}
    $ gcloud docker -- push ${IMAGE_NAME}
    

    If you used a custom image repo (for example, one that includes your user name), then create a greenplum/Values-overrides.yml file and add the key imageRepo with the correct value of ${IMAGE_REPO}.

  9. Go to the project directory that contains the deploy.sh script. For example:

    $ cd ./greenplum
    
  10. (Optional.) If necessary, override any default deployment options by creating or editing the Values-overrides.yml file. See Deployment Configuration Options.

  11. Execute the deploy.sh script, specifying the Greenplum Database segment count and helm release name:

    $ ./deploy.sh <segment_count> <helm_release_name>
    

    The deploy.sh script fails if the number of nodes in the cluster’s default-pool is not greater than or equal to 2 * segment_count + 2.


    For example, the following command deploys a set of Greenplum pods, with helm release named “foo”, creating 1 segment:

    $ ./deploy.sh 1 foo
    

    The above deploy.sh command fails if the number of nodes in the PKS cluster’s default-pool is not greater than or equal to 4 (2 * segment_count + 2).


    When you execute deploy.sh, the script performs several actions:

    • It creates the Greenplum Database master, standby, primary, and mirror segments each in a separate container in a separate pod.
    • It generates Persistent Volumes by default, and the volumes are destroyed when the cluster is deleted. You can optionally pre-create disks for usage across repeated cluster initialization operations. See (Optional) Preparing Pre-Created Disks for more information.
  12. Look at the bottom of the deploy.sh script output to find the IP address and port available for psql access. That output line has port 31000 and a IP address in it. For example:

    ################################
    The Kubernetes cluster is almost ready for Greenplum initialization.
    Run 'watch kubectl get pods' until all containers are running, then run:
    
    kubectl exec master /home/gpadmin/tools/wrap_initialize_cluster.bash
    
    ################################
    When that is finished, you should be able to connect from your desktop via a command like:
    
    psql -U gpadmin -p 5432 -h 104.198.135.115
    

    Record this IP address. Later instructions refer to this IP as PSQL_SERVICE_IP.

  13. Use kubectl to monitor the progress of the deployment, and wait until all of the containers reach the Running state, as shown below:

    $ kubectl get pods
    
    NAME                                  READY     STATUS    RESTARTS   AGE
    master                                1/1       Running   0          2m
    segment-0a                            1/1       Running   0          2m
    segment-0b                            1/1       Running   0          2m
    standby                               1/1       Running   0          2m
    
  14. After all pods are running, initialize the Greenplum cluster using the command:

    $ kubectl exec -it master /home/gpadmin/tools/wrap_initialize_cluster.bash
    

    The script scans host keys, creates a configuration file based on the number of segments specified, and initializes the Greenplum Database cluster.

  15. To validate the deployment, access the Greenplum Database deployment on the current development machine using the psql command.

    $ psql -U gpadmin -p 31000 -h ${PSQL_SERVICE_IP}
    

    In the above command, PSQL_SERVICE_IP should correspond to the IP address displayed in the deploy.sh script output in step 12.

  16. See Deleting the Greenplum Database Cluster for information about deleting the Greenplum cluster after you have finished using it.

*(Optional) Deploying to a Non-Default Namespace

By default the deploy.sh script deploys to the current namespace in the kubectl context. To display the context, enter:

$ kubectl config get-contexts $(kubectl config current-context)
CURRENT   NAME       CLUSTER    AUTHINFO   NAMESPACE
*         minikube   minikube   minikube

When no NAMESPACE is specified, as in the above example, then deploy.sh deploys the GPDB cluster to the default namespace.

To deploy to a namespace other than default, set the the target namespace with the command:

$ kubectl config set-context $(kubectl config current-context) --namespace=<another-namespace>

When deploy.sh detects that the target namespace is something other than default, it displays the notice:

#############################################################
NOTICE: deploy to non-default namespace: <another-namespace>
#############################################################

If <another-namespace> does not exist, then deploy.sh aborts with the error:

namespace: <another-namespace> doesn't exist. Please create this namespace and try again.

Deploying Greenplum to Google Kubernetes Engine

Follow this procedure to deploy Greenplum for Kubernetes to Google Kubernetes Engine (GKE) on Google Cloud Platform.

Prerequisites

When creating the Kubernetes cluster, ensure that you make the following selections on the Create a Kubernetes cluster screen of the Google Cloud Platform console:

  • For the Cluster Version option, select the most recent version of Kubernetes.
  • Scale the Machine Type option to at least 2 vCPUs / 7.5 GB memory.
  • For the Node Image option, you must select Ubuntu. You cannot deploy Greenplum with the Container-Optimized OS (cos) image.
  • Set the Size to 4 or more nodes.
  • Set Automatic node repair to Disabled (the default).
  • In the Advanced Options (click More to display Advanced Options), select Enable Kubernetes alpha features in this cluster. Also select I understand the consequences to confirm the choice.

Use the gcloud utility to login to GCP, and to set your current project and cluster context:

  1. Log into GCP:

    $ gcloud auth login
    
  2. Set the current project to the project where you will deploy Greenplum:

    $ gcloud config set project <project-name>
    
  3. Set the context to the Kubernetes cluster that you created for Greenplum:

    1. Access GCP Console.
    2. Select Kubernetes Engine > Clusters.
    3. Click Connect next to the cluster that you configured for Greenplum, and copy the connection command.
    4. On your local client machine, paste the command to set the context to your cluster. For example:

      $ gcloud container clusters get-credentials <username> --zone us-central1-a --project <my-project>
      
      Fetching cluster endpoint and auth data.
      kubeconfig entry generated for <username>.
      

In addition to the above, the Greenplum for Kubernetes deployment process requires the ability to map the host system’s /sys/fs/cgroup directory onto each container’s /sys/fs/cgroup. Ensure that no kernel security module (for example, AppArmor) uses a profile that disallows mounting /sys/fs/cgroup.

Obtain the Kubernetes service account key (key.json) file, and identify its location in the Values-common.yaml file as the value of dockerRegistryKeyJson. See Deployment Configuration Options.

Procedure

The procedure for deploying Greenplum using the deployment script is the same for both PKS and GKE clusters. Follow the numbered instructions in the PKS Procedure.

Deployment Configuration Options

Some options that affect deployment of Greenplum are available by editing a configuration file that is used by the deploy.sh script.

The configuration defaults are supplied in file greenplum/Values-common.yml, and these settings can be overridden by entries in greenplum/Values-overrides.yml

The available configurations, and their default values include:

imageRepo: greenplum-for-kubernetes
imageTag: latest # determines the tag for the image used by Greenplum containers
dockerRegistryKeyJson: key.json # specify a relative path from greenplum/ directory to key.json
# POD specific
memoryLimitRange: 4.5Gi
cpuLimitRange: 1.2
useAffinity: true
# Disk specific
usePreCreatedDisks: false
clusterPrefix: dev # only used when usePreCreatedDisks is true
usePersistentDisk: true
useGCE: true # only used when usePersistentDisk is true
useVsphere: false # only used when usePersistentDisk is true
useLocal: false # only used when usePersistentDisk is true
localDir: /gpdata # only used when useLocal is true
# GPDB specific
trustedCIDR: "10.1.1.0/24" # to allow connections from that subnet to the 'svc/greenplum'

Setting IS_DEV=true in your environment before running deploy.sh deploys the Greenplum cluster with certain configuration settings that are required for Minikube:

usePersistentDisk: false
useAffinity: false
trustedCIDR: 0.0.0.0/0

The default imagePullPolicy is IfNotPresent, which uses the local Docker image that you uploaded to Minikube.

Below is a description of all available settings, grouped to provide context.

Pod Resource Settings

memoryLimitRange: 4.5Gi
cpuLimitRange: 1.2
useAffinity: true

Users can specify how much CPU and Memory resources are used by each pod.

memoryLimitRange specifies the memory requirement for single Greenplum instance.

cpuLimitRangespecifies the CPU requirement for single Greenplum instance.

Note: a pod cannot be scheduled unless these resources are available on a node.

useAffinity has to do with Greenplum high availability (HA). In production, HA is important, but in a temporary proof-of-concept deployment, a user may want to turn off HA requirements in order to run with fewer resources

For HA operation, Greenplum insists that any mirror database must run on a different host than its primary database. Translating this into Kubernetes means that a node running a given segment’s primary database cannot also host its mirror database. (And this implies that the node running the mirror cannot run on the same physical server as the primary’s node.)

One part of this requirement is achieved by the logic implied by setting useAffinity to true. When useAffinity is true, none of the pods running Greenplum instances can be scheduled on the same Kubernetes node. In other words, there will be only ONE pod running Greenplum instance on a given Kubernetes node.

Storage Volume Settings

There are different performance manageability and durability trade-offs when considering where to store Greenplum data. Below, these trade-offs are discussed using a rating scale of Best, Better, Good, Zero for four separate storage choices.

Four key criteria are repeated below for each of the four storage choices:

  • performance: largely equivalent to storage speed
  • manageability: how much labor an administrator must spend to maintain the storage
  • durability: what are the failure cases for losing data
  • high availability (HA): whether mirrors are necessary to achieve HA

Example: Ephemeral Volume (development mode, not for production)

usePersistentDisk: false

The disk life cycle is linked to the pod life cycle.

  • Best Performance: local IO, no network delay
  • Best Manageability: automatic management–storage goes away when pod is destroyed
  • Zero Durability: pod failures lose data
  • HA: mirroring is required

Example: Local Persistent Volume

usePersistentDisk: true
useLocal: true
localDir: /gpdata

The disk life cycle is linked to the Kubernetes node life cycle. The data will be lost if the node fails.

  • Best Performance: local IO, no network delay
  • Good Manageability: administrators must manually clean up localDir on a Kubernetes node. A pod with the same name on the same node can reacquire the stored data. Moving a pod across nodes could lose data.
  • Good Durability: sustains pod failures; node failures lose data
  • HA: mirroring is required

Example: Remote Persistent Volumes, Dynamically Created by the IaaS

(There are two supported platforms for IaaS choice, Google Compute Engine (GCE) and VMWare vSphere (Vsphere))

usePersistentDisk: true
useGCE: true

OR

usePersistentDisk: true
useVsphere: true

The disk life cycle is managed by the IaaS. As long as the cluster is not deleted, the data remains. The volume can be easily remounted wherever the pod goes, on any Kubernetes node.

  • Good Performance: through network bandwidth
  • Best Manageability: automatic management by IaaS
  • Best Durability: sustains container, pod, node failures; Kubernetes cluster deletion will lose data
  • HA: mirroring not required (assuming reliance on IaaS storage for HA)

Example: Pre-Created, Remote Persistent Volumes

usePreCreatedDisks: true
clusterPrefix: dev

The disk life cycle is managed by the IaaS, and even if the cluster is deleted, the data remains. The volume can be easily remounted wherever the pod goes, on any Kubernetes node.

Limitation: this feature is currently only available on GCE.

  • Good Performance: through network bandwidth
  • Best Manageability: automatic management by IaaS
  • Best Durability: sustains container, pod, node, cluster failures
  • HA: mirroring not required (assuming reliance on IaaS storage for HA)

Greenplum Settings

Below are settings for security and configuration purposes.

Example: Custom IP Range (CIDR) for Trusted Client Connections

Greenplum Database is based on PostgreSQL, which uses entries in the pg_hba.conf file to determine whether to trust a given client connection.

Part of the Greenplum for Kubernetes configuration establishes the “trusted CIDR” for access. To change this value, create a file named ./Values-overrides.yml in the gp-kubernetes/greenplum directory (if one does not already exist). Then add the line:

trustedCidr: "10.1.1.0/24"

Replace the actual IP address range to match the local deployment requirements.

Expanding the Greenplum Deployment on PKS

To expand the Greenplum cluster, you first create new pods in the Kubernetes cluster, and then run the expand_cluster.bash script to put Greenplum containers on the new nodes. You can then use the standard Greenplum Database gpexpand command in the cluster to initialize the new segments and redistribute data.

Note: PKS cannot resize a cluster to a lower number of nodes; you must delete and re-create the cluster to reduce the number of nodes.

Follow these steps to expand Greenplum on PKS:

  1. Use pks resize to resize the cluster to the new total number of nodes that are required. The cluster must have two nodes for each Greenplum Database segment (to accommodate a primary and mirror database for each segment) and an additional two nodes for the master and standby master. Run pks resize with the options:

    pks resize --wait --num-nodes <new_total_number_of_nodes>
    

    In the above command, <new_total_number_of_nodes> can be calculated by the formula: = new_total_number_of_segments * 2 + 2

    Note: This command may take a considerable amount of time (20 minutes or longer) because procuring new nodes is a time-consuming process.

  2. After the pks resize command completes, use the expand_cluster.bash script to put Greenplum containers onto the nodes you provided. Enter:

    expand_cluster.bash <greenplum_cluster_name> <new_total_number_of_segments>
    

    where <greenplum_cluster_name> is the name of the existing Greenplum cluster and <new_total_number_of_segments> is the total number of segments the cluster will have after expansion is complete. If you do not remember the original <greenplum_cluster_name> you can display it by using the helm list command.

  3. After the Greenplum containers are placed on the new nodes, you can use the Greenplum Database gpexpand tool to initialize the new segments. The input file to gpexpand requires details about the new segments that you are adding to the Greenplum Database cluster. (See Initializing New Segments in the Greenplum Database documentation for additional information.) When using these instructions, keep in mind the following information and conventions that are used in the Greenplum for Kubernetes deployment environment:


    Segment Host Names and Port Numbers
    All Greenplum for Kubernetes segment host names follow the pattern:

    segment-<number>[a|b]
    

    where <number> is the number of the segment, starting from 0. The letter “a” or “b” indicates a primary or mirror segment, respectively. For example, a cluster with 2 segments has the host names:

    • segment-0a (the primary for data segment 0)
    • segment-0b (the mirror for data segment 0)
    • segment-1a (the primary for data segment 1)
    • segment-1b (the mirror for data segment 1)

    Primary and mirror segment hosts use the port configuration:

    Port Number
    primary port 40000
    primary replication port 6000
    mirror port 50000
    mirror replication port 6001


    Data Directories
    Each segment host uses the same data directory naming convention:

    Data Directory Path
    primary data directory /greenplum/data
    mirror data directory /greenplum/mirror/data


    Database ID and Content ID
    Each segment database requires a unique Database ID. Greenplum for Kubernetes standardizes this database ID to be 1 for the master, and increments the value for each primary and mirror segment database. For example, the primary and mirror will each have a unique, incremented ID. In order to expand the cluster, you will need to provide the new Database ID values for the new segments using this convention.


    Each Greenplum segment has a single Content ID that is shared by both the primary and the mirror database for that segment. Greenplum for Kubernetes standardizes the Contend ID to be -1 for master, and ends at <new_total_number_of_segments> - 1 where <new_total_number_of_segments> is total number of new segments that you are adding to the existing Greenplum cluster. In order to expand the cluster, you will need to provide the new Content ID values for the new segments using this convention.


    Example: Programmatically creating a gpexpand initialization file
    You can use kubectl commands to programmatically generate the contents of a gpexpand initialization file. This series of commands takes as input the number of new segments you are adding to the Greenplum cluster (as the environment variable new_total_number_of_segments), and uses that value to create entries in /tmp/expand_detail_file that use the gpexpand initialization file format:

    hostname:address:port:fselocation:dbid:content:preferred_role:replication_port
    

    Follow this procedure to use the example commands:

    1. On your client machine, set the environment variable, new_total_number_of_segments, to the new total number of segments in your expanded Greenplum cluster. For example:

      export new_total_number_of_segments=4
      
    2. Copy and paste the following commands into the terminal where you set the environment variable:

      set -u
      echo "expanding to a total of ${new_total_number_of_segments}"
      last_dbid=$(kubectl exec master -- bash -l -c "psql --tuples-only -c 'select max(dbid)    from gp_segment_configuration'")
      last_contentid=$(kubectl exec master -- bash -l -c "psql --tuples-only -c 'select max   (content) from gp_segment_configuration'")
      
      dbid=$((last_dbid+1))
      for i in $(seq $((last_contentid+1)) $((new_total_number_of_segments-1)))
      do
          echo "segment-${i}a:segment-${i}a:40000:/greenplum/data:${dbid}:${i}:p:6000" >>    /tmp/expand_detail_file
          dbid=$((dbid+1))
          echo "segment-${i}b:segment-${i}b:50000:/greenplum/mirror/data:${dbid}:${i}:m:6001"    >> /tmp/expand_detail_file
          dbid=$((dbid+1))
      done
      

      These commands use information obtained from the Greenplum cluster to populate a gpexpand initialization file in /tmp/expand_detail_file. Examine the contents of the file to verify that it describes the new segment(s) you are adding.

    3. Copy the generated file from your client into the Greenplum cluster on PKS:

      kubectl cp /tmp/expand_detail_file master:/tmp/expand_detail_file
      
    4. Use the initialization file with gpexpand in the PKS cluster to initialize the new segments:

      kubectl exec master -- bash -l -c "gpexpand -i /tmp/expand_detail_file -D gpadmin"
      
    5. After the gpexpand utility exits, execute a query to verify the new segment configuration of your cluster:

      kubectl exec master -- bash -l -c "psql -c 'select * from gp_segment_configuration'"