Deploying a Greenplum Cluster Using Bash Scripts (Deprecated)

Note: This topic describes how to deploy a new Greenplum Cluster to Kubernetes using bash scripts provided in the Greenplum for Kubernetes distribution. This procedure is deprecated, and the associated bash scripts will not be included in the final release of Greenplum for Kubernetes. Use the Greenplum Operator for all deployment operations, as described in Deploying a New Greenplum Cluster.

Deploying Greenplum to Minikube

Prerequisites

Install the required software and prepare Minikube as described in Minikube. Also install Greenplum for Kubernetes as described in Installing Greenplum for Kubernetes.

Procedure

  1. Confirm that the kubectl utility can access Minikube:

    $ kubectl get node
    
    NAME       STATUS    ROLES     AGE       VERSION
    minikube   Ready     master    2m        v1.10.0
    

    Note: If you have problems starting or connecting to Minikube, use minikube delete to remove the current Minikube and then recreate it.

  2. Go into the directory where you unpacked the Greenplum for Kubernetes distribution. For example:

    $ cd ./greenplum-for-kubernetes-*
    
  3. Deploy Greenplum containers to Minikube by executing the deploy.sh script found in the greenplum directory. Minikube deployments use the syntax IS_DEV=true ./deploy.sh <data_segment_count> <helm_release_name> where:

    • IS_DEV=true is an environment variable that automatically configures certain deployment properties that are required for Minikube. See Bash Script Deployment Configuration Options (Deprecated) for additional information.
      Note: When IS_DEV=true, the Greenplum cluster is configured to accept connections from all IP addresses (0.0.0.0/0).
    • <data_segment_count> is the number of logical data segments to create in the cluster, not including the master, standby master, or mirror segments. Each data segment that you create is mirrored by default. For example, if you deploy a cluster with the <data_segment_count> as 1, the Greenplum cluster will include a master segment, a standby master, one primary segment, and one mirror segment.
    • <helm_release_name> is the helm release name to assign to the Greenplum deployment.


    For example, the following commands deploy a new Greenplum cluster with one data segment to Minikube, using the helm release name “gpdb”:

    $ cd greenplum
    $ IS_DEV=true ./deploy.sh 1 gpdb
    
    +++ dirname ./deploy.sh
    ++ cd .
    ++ pwd
    + SCRIPT_DIR=/Users/demouser/Downloads/greenplum-for-kubernetes/greenplum
    + source /Users/demouser/Downloads/greenplum-for-kubernetes/greenplum/wait_for_deployment.bash
    ++ SLEEP_SECS=1
    ++ TIMEOUT_SECS=300
    ++ NAMESPACE=default
    + SEGMENT_COUNT=1
    + export SEGMENT_COUNT
    + CLUSTER_NAME=gpdb
    + [[ gpdb == *\_* ]]
    + [[ gpdb =~ [A-Z] ]]
    + true
    + IS_DEV=true
    + verify_node_count 1
    + true
    + set +x
    ########################################################
    ##   Generating configuration for local-dev           ##
    ########################################################
    + env IS_DEV=true /Users/demouser/Downloads/greenplum-for-kubernetes/greenplum/generate_cluster_configuration.bash
    +++ dirname /Users/demouser/Downloads/greenplum-for-kubernetes/greenplum/generate_cluster_configuration.bash
    
    [...]
    
    NAME:   gpdb
    LAST DEPLOYED: Thu Jul 19 13:14:20 2018
    NAMESPACE: default
    STATUS: DEPLOYED
    
    RESOURCES:
    ==> v1/ServiceAccount
    NAME  SECRETS  AGE
    gpdb  1        1s
    
    ==> v1/Service
    NAME       TYPE          CLUSTER-IP      EXTERNAL-IP  PORT(S)         AGE
    agent      ClusterIP     None            <none>       22/TCP          1s
    greenplum  LoadBalancer  10.100.126.245  <pending>    5432:30630/TCP  1s
    
    ==> v1/Pod
    NAME         READY  STATUS             RESTARTS  AGE
    master-0     0/1    ContainerCreating  0         1s
    master-1     0/1    ContainerCreating  0         0s
    segment-a-0  0/1    ContainerCreating  0         0s
    segment-b-0  0/1    ContainerCreating  0         0s
    
    ==> v1/ConfigMap
    NAME              DATA  AGE
    greenplum-config  1     1s
    
    + true
    


    When you execute deploy.sh, the script performs several actions: - It creates the Greenplum Database master, standby, primary, and mirror segments each in a separate container in a separate pod. - It generates Persistent Volumes by default, and the volumes are destroyed when the cluster is deleted.

  4. Execute kubectl get pods to monitor the progress of the deployment:

    $ kubectl get pods
    
    NAME          READY     STATUS    RESTARTS   AGE
    master-0      1/1       Running   0          2m
    master-1      1/1       Running   0          2m
    segment-a-0   1/1       Running   0          2m
    segment-b-0   1/1       Running   0          2m
    

    Each Greenplum segment (master, standby, primary, and mirror) is deployed in a single container on each pod. When the container on a pod is ready (indicated by the 1/1 entry in the READY column), then the pod itself shows the status Running.

    Important: You must wait for all of the deployed pods to reach the Running status before you continue to the next step. If any pods are still starting, re-execute kubectl get pods at a later time to check the deployment status. The example above shows that all pods are Running, and you can continue with the next step.

  5. After all pods are running, initialize the Greenplum cluster using the command:

    $ kubectl exec -it master-0 /home/gpadmin/tools/wrap_initialize_cluster.bash
    
    Key scanning started
    *******************************
    Initializing Greenplum for Kubernetes Cluster
    *******************************
    *******************************
    SSH KeyScan started
    *******************************
    *******************************
    Generating gpinitsystem_config
    *******************************
    Sub Domain for the cluster is: agent.default.svc.cluster.local
    *******************************
    Running gpinitsystem
    *******************************
    
    [...]
    
    20180927:21:13:41:000101 gpinitsystem:master-0:gpadmin-[WARN]:-*******************************************************
    20180927:21:13:41:000101 gpinitsystem:master-0:gpadmin-[INFO]:-Greenplum Database instance successfully created
    20180927:21:13:41:000101 gpinitsystem:master-0:gpadmin-[INFO]:-------------------------------------------------------
    20180927:21:13:41:000101 gpinitsystem:master-0:gpadmin-[INFO]:-To complete the environment configuration, please
    20180927:21:13:41:000101 gpinitsystem:master-0:gpadmin-[INFO]:-update gpadmin .bashrc file with the following
    20180927:21:13:41:000101 gpinitsystem:master-0:gpadmin-[INFO]:-1. Ensure that the greenplum_path.sh file is sourced
    20180927:21:13:41:000101 gpinitsystem:master-0:gpadmin-[INFO]:-2. Add "export MASTER_DATA_DIRECTORY=/greenplum/data-1"
    20180927:21:13:41:000101 gpinitsystem:master-0:gpadmin-[INFO]:-   to access the Greenplum scripts for this instance:
    20180927:21:13:41:000101 gpinitsystem:master-0:gpadmin-[INFO]:-   or, use -d /greenplum/data-1 option for the Greenplum scripts
    20180927:21:13:41:000101 gpinitsystem:master-0:gpadmin-[INFO]:-   Example gpstate -d /greenplum/data-1
    20180927:21:13:41:000101 gpinitsystem:master-0:gpadmin-[INFO]:-Script log file = /home/gpadmin/gpAdminLogs/gpinitsystem_20180927.log
    20180927:21:13:41:000101 gpinitsystem:master-0:gpadmin-[INFO]:-To remove instance, run gpdeletesystem utility
    20180927:21:13:41:000101 gpinitsystem:master-0:gpadmin-[INFO]:-Standby Master master-1.agent.default.svc.cluster.local has been configured
    20180927:21:13:41:000101 gpinitsystem:master-0:gpadmin-[INFO]:-To activate the Standby Master Segment in the event of Master
    20180927:21:13:41:000101 gpinitsystem:master-0:gpadmin-[INFO]:-failure review options for gpactivatestandby
    20180927:21:13:41:000101 gpinitsystem:master-0:gpadmin-[INFO]:-------------------------------------------------------
    20180927:21:13:41:000101 gpinitsystem:master-0:gpadmin-[INFO]:-The Master /greenplum/data-1/pg_hba.conf post gpinitsystem
    20180927:21:13:41:000101 gpinitsystem:master-0:gpadmin-[INFO]:-has been configured to allow all hosts within this new
    20180927:21:13:41:000101 gpinitsystem:master-0:gpadmin-[INFO]:-array to intercommunicate. Any hosts external to this
    20180927:21:13:41:000101 gpinitsystem:master-0:gpadmin-[INFO]:-new array must be explicitly added to this file
    20180927:21:13:41:000101 gpinitsystem:master-0:gpadmin-[INFO]:-Refer to the Greenplum Admin support guide which is
    20180927:21:13:41:000101 gpinitsystem:master-0:gpadmin-[INFO]:-located in the /opt/gpdb/docs directory
    20180927:21:13:41:000101 gpinitsystem:master-0:gpadmin-[INFO]:-------------------------------------------------------
    *******************************
    Adding trusted CIDR to master-0 pg_hba.conf
    *******************************
    *******************************
    Adding trusted CIDR to master-1 pg_hba.conf
    *******************************
    *******************************
    Reloading greenplum configs
    *******************************
    20180927:21:13:42:002313 gpstop:master-0:gpadmin-[INFO]:-Starting gpstop with args: -u
    20180927:21:13:42:002313 gpstop:master-0:gpadmin-[INFO]:-Gathering information and validating the environment...
    20180927:21:13:42:002313 gpstop:master-0:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information
    20180927:21:13:42:002313 gpstop:master-0:gpadmin-[INFO]:-Obtaining Segment details from master...
    20180927:21:13:43:002313 gpstop:master-0:gpadmin-[INFO]:-Greenplum Version: 'postgres (Greenplum Database) 5.11.1 build 5408666-oss'
    20180927:21:13:43:002313 gpstop:master-0:gpadmin-[INFO]:-Signalling all postmaster processes to reload
    .
    *******************************
    Running createdb
    *******************************
    

    The script exchanges keys, creates a configuration file based on the number of segments specified, and initializes the Greenplum Database cluster.

  6. To validate the deployment, access the Greenplum Database deployment using the psql command inside Minikube, and execute SQL to validate the cluster deployment. For example, start by running psql in Minikube:

    $ kubectl exec -it master bash -- -c "source /opt/gpdb/greenplum_path.sh; psql"
    

    After psql starts, you can execute Greenplum SQL statements. For example, query the segment configuration with:

    gpadmin=# select * from gp_segment_configuration;
    
     dbid | content | role | preferred_role | mode | status | port  |                 hostname                 |                   address                   | replication_port
    ------+---------+------+----------------+------+--------+-------+------------------------------------------+---------------------------------------------+------------------
        2 |       0 | p    | p              | s    | u      | 40000 | segment-a-0                              | segment-a-0.agent.default.svc.cluster.local |             6000
        3 |       0 | m    | m              | s    | u      | 50000 | segment-b-0                              | segment-b-0.agent.default.svc.cluster.local |             6001
        1 |      -1 | p    | p              | s    | u      |  5432 | master-0.agent.default.svc.cluster.local | master-0.agent.default.svc.cluster.local    |
        4 |      -1 | m    | m              | s    | u      |  5432 | master-1.agent.default.svc.cluster.local | master-1.agent.default.svc.cluster.local    |
    (4 rows)
    
  7. To exit the psql utility (and the interactive shell in Kubernetes), enter \q.

  8. To delete the Greenplum Cluster after you have finished using it, see Deleting the Greenplum Database Cluster.

  9. To stop Minikube, execute the command:

    $ minikube stop
    
    Stopping local Kubernetes cluster...
    Machine stopped.
    

(Optional) Preparing Pre-Created Disks for PKS and GKE Deployments

Optionally, the Greenplum for Kubernetes service can be deployed using pre-created disks instead of persistent volumes. The workspace/samples/scripts/create_disks.bash script contains an example for doing this on the Google Cloud Platform. It uses the syntax:

$ create_disks.bash <cluster_prefix> <segment_count>

The script creates disks that are sized at 1GB each.

Follow these instructions to deploy Greenplum for Kubernetes using pre-created disks:

  1. Go to the workspace directory from the installation package. For example:

    $ cd greenplum-for-kubernetes-*/workspace
    
  2. Execute samples/scripts/create_disks.bash using the cluster prefix and desired segment count for the deployment. For example:

    $ samples/scripts/create_disks.bash gpdb 1
    
  3. When prompted, enter the number corresponding to the zone where you want to create each disk. After creating all disks, the script ends with output similar to:

    Created [https://www.googleapis.com/compute/v1/projects/gp-kubernetes/zones/us-central1-a/disks/segment-b-0-gpdb-disk].
    NAME                   ZONE           SIZE_GB  TYPE         STATUS
    segment-b-0-gpdb-disk  us-central1-a  20       pd-standard  READY
    
    New disks are unformatted. You must format and mount a disk before it
    can be used. You can find instructions on how to do this at:
    
    https://cloud.google.com/compute/docs/disks/add-persistent-disk#formatting
    

    Note that the Greenplum deployment script automatically formats the disks.

  4. Create a file named ./Values-overrides.yml in your workspace and add the following lines:

    usePreCreatedDisks: true
    clusterPrefix: <cluster_prefix>
    

    Replace <cluster_prefix> with the prefix used in the create_disks.bash command (for example, “gpdb”).

  5. Follow the procedure to deploy Greenplum on PKS or GKE, specifying the same number of segments. The ./Values-overrides.yml file targets the deployment to use pre-created disks that were previously generated with the cluster prefix.

Deploying Greenplum to PKS

Follow this procedure to deploy Greenplum for Kubernetes to PKS.

Prerequisites

This procedure requires that PKS on GCP is installed and running, along with all prerequisite software and configuration. See Installing PKS for Greenplum for Kubernetes, Using Google Cloud Platform (GCP) for information about installing PKS.

Obtain the Kubernetes service account key (key.json) file, and identify its location in the Values-common.yaml file as the value of dockerRegistryKeyJson. See Deployment Configuration Options.

Before you attempt to deploy Greenplum, ensure that the target cluster is available. Execute the following command make make sure that the target cluster displays in the output:

pks list-clusters

Note: The pks login cookie typically expires after a day or two.

The Greenplum for Kubernetes deployment process requires the ability to map the host system’s /sys/fs/cgroup directory onto each container’s /sys/fs/cgroup. Ensure that no kernel security module (for example, AppArmor) uses a profile that disallows mounting /sys/fs/cgroup.

To use pre-created disks with PKS instead of (default) automatically-managed persistent volumes, follow the instructions in (Optional) Preparing Pre-Created Disks before continuing with the procedure.

Note: If any problems occur during deployment, retry deploying Greenplum by first remove the previous deployment.

Procedure

  1. Download the Greenplum for Kubernetes software from Pivotal Network. The download file has the name: greenplum-for-kubernetes-<version>.tar.gz.

  2. Go to the directory where you downloaded Greenplum for Kubernetes, and unpack the downloaded software. For example:

    $ cd ~/Downloads
    $ tar xzf greenplum-for-kubernetes-*.tar.gz
    

    The above command unpacks the distribution into a new directory named greenplum-for-kubernetes-<version>.

  3. Go into the new greenplum-for-kubernetes-<version> directory:

    $ cd ./greenplum-for-kubernetes-*
    
  4. Ensure that helm has sufficient privileges via a Kubernetes service account. Use a command like:

    $ kubectl create -f initialize_helm_rbac.yaml
    
    serviceaccount "tiller" created
    clusterrolebinding.rbac.authorization.k8s.io "tiller" created
    

    This sets the necessary privileges for helm with a service account named tiller.

  5. Initialize and upgrade helm with the command:

    $ helm init --wait --service-account tiller --upgrade
    
    $HELM_HOME has been configured at /<path>/.helm.
    
    Tiller (the Helm server-side component) has been upgraded to the current version.
    Happy Helming!
    
  6. Load the Greenplum for Kubernetes Docker image to the local Docker registry:

    $ docker load -i ./images/greenplum-for-kubernetes
    
    644879075e24: Loading layer [==================================================>]  117.9MB/117.9MB
    d7ff1dc646ba: Loading layer [==================================================>]  15.87kB/15.87kB
    686245e78935: Loading layer [==================================================>]  14.85kB/14.85kB
    d73dd9e65295: Loading layer [==================================================>]  5.632kB/5.632kB
    2de391e51d73: Loading layer [==================================================>]  3.072kB/3.072kB
    4605c0a3f29d: Loading layer [==================================================>]  633.4MB/633.4MB
    c8d909e84bbf: Loading layer [==================================================>]  1.682MB/1.682MB
    7e66ff617b4c: Loading layer [==================================================>]  4.956MB/4.956MB
    db9d4b8567ab: Loading layer [==================================================>]  17.92kB/17.92kB
    223fe4d67f77: Loading layer [==================================================>]  3.584kB/3.584kB
    2e75b028b124: Loading layer [==================================================>]  43.04MB/43.04MB
    1a7d923392f7: Loading layer [==================================================>]   2.56kB/2.56kB
    2b9cc11f6cfc: Loading layer [==================================================>]  176.6kB/176.6kB
    Loaded image: greenplum-for-kubernetes:v0.0.1.dev.391.gaf0b6d5
    
  7. Verify that the Docker image tag greenplum-for-kubernetes:<version> is now available:

    $ docker images | grep greenplum-for-kubernetes
    
    greenplum-for-kubernetes                   v0.0.1.dev.391.gaf0b6d5   0a501efdde09        9 days ago          775MB
    
  8. Decide the ${IMAGE_REPO} of where to push the greenplum-for-kubernetes docker image.

    As an example, to push the image to a Google Cloud Registry under the current Google Cloud project:

    $ PROJECT=$(gcloud config list core/project --format='value(core.project)')
    $ IMAGE_REPO="gcr.io/${PROJECT}/greenplum-for-kubernetes"
    $ IMAGE_NAME="${IMAGE_REPO}:$(cat ./images/greenplum-for-kubernetes-tag)"
    $ docker tag $(cat ./images/greenplum-for-kubernetes-id) ${IMAGE_NAME}
    $ gcloud docker -- push ${IMAGE_NAME}
    

    If you used a custom image repo (for example, one that includes your user name), then create a greenplum/Values-overrides.yml file and add the key imageRepo with the correct value of ${IMAGE_REPO}.

  9. Go to the project directory that contains the deploy.sh script. For example:

    $ cd ./greenplum
    
  10. (Optional.) If necessary, override any default deployment options by creating or editing the Values-overrides.yml file. See Deployment Configuration Options.

  11. Execute the deploy.sh script, specifying the Greenplum Database segment count and helm release name:

    $ ./deploy.sh <segment_count> <helm_release_name>
    

    The deploy.sh script fails if the number of nodes in the cluster’s default-pool is not greater than or equal to 2 * segment_count + 2.


    For example, the following command deploys a set of Greenplum pods, with helm release named “foo”, creating 1 segment:

    $ ./deploy.sh 1 foo
    

    The above deploy.sh command fails if the number of nodes in the PKS cluster’s default-pool is not greater than or equal to 4 (2 * segment_count + 2).


    When you execute deploy.sh, the script performs several actions:

    • It creates the Greenplum Database master, standby, primary, and mirror segments each in a separate container in a separate pod.
    • It generates Persistent Volumes by default, and the volumes are destroyed when the cluster is deleted. You can optionally pre-create disks for usage across repeated cluster initialization operations. See (Optional) Preparing Pre-Created Disks for more information.
  12. Look at the bottom of the deploy.sh script output to find the IP address and port available for psql access. That output line has port 31000 and a IP address in it. For example:

    ################################
    The Kubernetes cluster is almost ready for Greenplum initialization.
    Run 'watch kubectl get pods' until all containers are running, then run:
    
    kubectl exec master-0 /home/gpadmin/tools/wrap_initialize_cluster.bash
    
    ################################
    When that is finished, you should be able to connect from your desktop via a command like:
    
    psql -U gpadmin -p 5432 -h 104.198.135.115
    

    Record this IP address. Later instructions refer to this IP as PSQL_SERVICE_IP.

  13. Use kubectl to monitor the progress of the deployment, and wait until all of the containers reach the Running state, as shown below:

    $ kubectl get pods
    
    NAME           READY     STATUS    RESTARTS   AGE
    master-0       1/1       Running   0          2m
    master-1       1/1       Running   0          2m
    segment-a-0    1/1       Running   0          2m
    segment-b-0    1/1       Running   0          2m
    
  14. After all pods are running, initialize the Greenplum cluster using the command:

    $ kubectl exec -it master-0 /home/gpadmin/tools/wrap_initialize_cluster.bash
    

    The script scans host keys, creates a configuration file based on the number of segments specified, and initializes the Greenplum Database cluster.

  15. To validate the deployment, access the Greenplum Database deployment on the current development machine using the psql command.

    $ psql -U gpadmin -p 31000 -h ${PSQL_SERVICE_IP}
    

    In the above command, PSQL_SERVICE_IP should correspond to the IP address displayed in the deploy.sh script output in step 12.

  16. See Deleting the Greenplum Database Cluster for information about deleting the Greenplum cluster after you have finished using it.

(Optional) Deploying to a Non-Default Namespace

By default the deploy.sh script deploys to the current namespace in the kubectl context. To display the context, enter:

$ kubectl config get-contexts $(kubectl config current-context)
CURRENT   NAME       CLUSTER    AUTHINFO   NAMESPACE
*         minikube   minikube   minikube

When no NAMESPACE is specified, as in the above example, then deploy.sh deploys the GPDB cluster to the default namespace.

To deploy to a namespace other than default, set the the target namespace with the command:

$ kubectl config set-context $(kubectl config current-context) --namespace=<another-namespace>

When deploy.sh detects that the target namespace is something other than default, it displays the notice:

#############################################################
NOTICE: deploy to non-default namespace: <another-namespace>
#############################################################

If <another-namespace> does not exist, then deploy.sh aborts with the error:

namespace: <another-namespace> doesn't exist. Please create this namespace and try again.

Expanding the Greenplum Deployment on PKS

To expand the Greenplum cluster, you first create new pods in the Kubernetes cluster, and then run the expand_cluster.bash script to put Greenplum containers on the new nodes. You can then use the standard Greenplum Database gpexpand command in the cluster to initialize the new segments and redistribute data.

Note: PKS cannot resize a cluster to a lower number of nodes; you must delete and re-create the cluster to reduce the number of nodes.

Procedure

Follow these steps to expand Greenplum on PKS:

  1. Use pks resize to resize the cluster to the new total number of nodes that are required. The cluster must have two nodes for each Greenplum Database segment (to accommodate a primary and mirror database for each segment) and an additional two nodes for the master and standby master. Run pks resize with the options:

    pks resize --wait --num-nodes <new_total_number_of_nodes>
    

    In the above command, <new_total_number_of_nodes> can be calculated by the formula: = new_total_number_of_segments * 2 + 2

    Note: This command may take a considerable amount of time (20 minutes or longer) because procuring new nodes is a time-consuming process.

  2. After the pks resize command completes, use the expand_cluster.bash script to put Greenplum containers onto the nodes you provided. Enter:

    expand_cluster.bash <greenplum_cluster_name> <new_total_number_of_segments>
    

    where <greenplum_cluster_name> is the name of the existing Greenplum cluster and <new_total_number_of_segments> is the total number of segments the cluster will have after expansion is complete. If you do not remember the original <greenplum_cluster_name> you can display it by using the helm list command.

  3. After the Greenplum containers are placed on the new nodes, you can use the Greenplum Database gpexpand tool to initialize the new segments. The input file to gpexpand requires details about the new segments that you are adding to the Greenplum Database cluster. (See Initializing New Segments in the Greenplum Database documentation for additional information.) When using these instructions, keep in mind the following information and conventions that are used in the Greenplum for Kubernetes deployment environment:


    Segment Host Names and Port Numbers
    All Greenplum for Kubernetes segment host names follow the pattern:

    segment-[a|b]-<number>
    

    where <number> is the number of the segment, starting from 0. The letter “a” or “b” indicates a primary or mirror segment, respectively. For example, a cluster with 2 segments has the host names:

    • segment-a-0 (the primary for data segment 0)
    • segment-b-0 (the mirror for data segment 0)
    • segment-a-1 (the primary for data segment 1)
    • segment-b-1 (the mirror for data segment 1)

    Primary and mirror segment hosts use the port configuration:

    Port Number
    primary port 40000
    primary replication port 6000
    mirror port 50000
    mirror replication port 6001


    Data Directories
    Each segment host uses the same data directory naming convention:

    Data Directory Path
    primary data directory /greenplum/data
    mirror data directory /greenplum/mirror/data


    Database ID and Content ID
    Each segment database requires a unique Database ID. Greenplum for Kubernetes standardizes this database ID to be 1 for the master, and increments the value for each primary and mirror segment database. For example, the primary and mirror will each have a unique, incremented ID. In order to expand the cluster, you will need to provide the new Database ID values for the new segments using this convention.


    Each Greenplum segment has a single Content ID that is shared by both the primary and the mirror database for that segment. Greenplum for Kubernetes standardizes the Contend ID to be -1 for master, and ends at <new_total_number_of_segments> - 1 where <new_total_number_of_segments> is total number of new segments that you are adding to the existing Greenplum cluster. In order to expand the cluster, you will need to provide the new Content ID values for the new segments using this convention.


    Example: Programmatically creating a gpexpand initialization file
    You can use kubectl commands to programmatically generate the contents of a gpexpand initialization file. This series of commands takes as input the number of new segments you are adding to the Greenplum cluster (as the environment variable new_total_number_of_segments), and uses that value to create entries in /tmp/expand_detail_file that use the gpexpand initialization file format:

    hostname:address:port:fselocation:dbid:content:preferred_role:replication_port
    

    Follow this procedure to use the example commands:

    1. On your client machine, set the environment variable, new_total_number_of_segments, to the new total number of segments in your expanded Greenplum cluster. For example:

      export new_total_number_of_segments=4
      
    2. Copy and paste the following commands into the terminal where you set the environment variable:

      set -u
      echo "expanding to a total of ${new_total_number_of_segments}"
      last_dbid=$(kubectl exec master-0 -- bash -l -c "psql --tuples-only -c 'select max(dbid)    from gp_segment_configuration'")
      last_contentid=$(kubectl exec master-0 -- bash -l -c "psql --tuples-only -c 'select max   (content) from gp_segment_configuration'")
      
      dbid=$((last_dbid+1))
      for i in $(seq $((last_contentid+1)) $((new_total_number_of_segments-1)))
      do
          echo "segment-a-${i}:segment-a-${i}:40000:/greenplum/data:${dbid}:${i}:p:6000" >>    /tmp/expand_detail_file
          dbid=$((dbid+1))
          echo "segment-b-${i}:segment-b-${i}:50000:/greenplum/mirror/data:${dbid}:${i}:m:6001"    >> /tmp/expand_detail_file
          dbid=$((dbid+1))
      done
      

      These commands use information obtained from the Greenplum cluster to populate a gpexpand initialization file in /tmp/expand_detail_file. Examine the contents of the file to verify that it describes the new segment(s) you are adding.

    3. Copy the generated file from your client into the Greenplum cluster on PKS:

      kubectl cp /tmp/expand_detail_file master:/tmp/expand_detail_file
      
    4. Use the initialization file with gpexpand in the PKS cluster to initialize the new segments:

      kubectl exec master -- bash -l -c "gpexpand -i /tmp/expand_detail_file -D gpadmin"
      
    5. After the gpexpand utility exits, execute a query to verify the new segment configuration of your cluster:

      kubectl exec master -- bash -l -c "psql -c 'select * from gp_segment_configuration'"
      

Deploying Greenplum to Google Kubernetes Engine

Follow this procedure to deploy Greenplum for Kubernetes to Google Kubernetes Engine (GKE) on Google Cloud Platform.

Prerequisites

When creating the Kubernetes cluster, ensure that you make the following selections on the Create a Kubernetes cluster screen of the Google Cloud Platform console:

  • For the Cluster Version option, select the most recent version of Kubernetes.
  • Scale the Machine Type option to at least 2 vCPUs / 7.5 GB memory.
  • For the Node Image option, you must select Ubuntu. You cannot deploy Greenplum with the Container-Optimized OS (cos) image.
  • Set the Size to 4 or more nodes.
  • Set Automatic node repair to Disabled (the default).
  • In the Advanced Options (click More to display Advanced Options), select Enable Kubernetes alpha features in this cluster. Also select I understand the consequences to confirm the choice.

Use the gcloud utility to login to GCP, and to set your current project and cluster context:

  1. Log into GCP:

    $ gcloud auth login
    
  2. Set the current project to the project where you will deploy Greenplum:

    $ gcloud config set project <project-name>
    
  3. Set the context to the Kubernetes cluster that you created for Greenplum:

    1. Access GCP Console.
    2. Select Kubernetes Engine > Clusters.
    3. Click Connect next to the cluster that you configured for Greenplum, and copy the connection command.
    4. On your local client machine, paste the command to set the context to your cluster. For example:

      $ gcloud container clusters get-credentials <username> --zone us-central1-a --project <my-project>
      
      Fetching cluster endpoint and auth data.
      kubeconfig entry generated for <username>.
      

In addition to the above, the Greenplum for Kubernetes deployment process requires the ability to map the host system’s /sys/fs/cgroup directory onto each container’s /sys/fs/cgroup. Ensure that no kernel security module (for example, AppArmor) uses a profile that disallows mounting /sys/fs/cgroup.

Obtain the Kubernetes service account key (key.json) file, and identify its location in the Values-common.yaml file as the value of dockerRegistryKeyJson. See Deployment Configuration Options.

Procedure

The procedure for deploying Greenplum using the deployment script is the same for both PKS and GKE clusters. Follow the numbered instructions in the PKS Procedure.