Expanding a Greenplum Deployment

To expand a Greenplum cluster, you first use the Greenplum Operator to apply an updated Greenplum cluster configuration that increases the number of segments. The Greenplum Operator automatically creates the new segment pods in Kubernetes and starts a job to run gpexpand and initialize the new segments. You can optionally run manual commands to redistribute data to the new segments, and to remove the gpexpand schema that is created during the expansion process.

Note: You cannot resize a cluster to use a lower number of segments; you must delete and re-create the cluster to reduce the number of segments.

Follow these steps to expand a Greenplum for Kubernetes cluster:

  1. Go to the workspace subdirectory where you unpacked the Greenplum for Kubernetes distribution, or to the directory where you created your Greenplum cluster deployment manifest. For example:

    $ cd ./greenplum-for-kubernetes-*/workspace
    
  2. Edit the manifest file that you used to deploy your Greenplum cluster. Edit the file to increase the primarySegmentCount value. This example doubles the number of segments defined in the default Greenplum deployment (my-gp-instance.yaml):

    primarySegmentCount: 6
    
  3. After modifying the file, use kubectl to apply the change. For example:

    $ kubectl apply -f my-gp-instance.yaml
    
    greenplumcluster.greenplum.pivotal.io/my-greenplum configured
    
  4. Execute watch kubectl get all and wait until the new Greenplum pods reach the Running state:. Also observe the progress of the expansion job (job.batch/my-greenplum-gpexpand-job) and wait for it to complete:

    $ watch kubectl get all
    
    NAME                                      READY   STATUS    RESTARTS   AGE
    pod/greenplum-operator-5cbd87fcb9-5nmfq   1/1     Running   0          22m
    pod/master-0                              1/1     Running   0          20m
    pod/master-1                              1/1     Running   0          20m
    pod/my-greenplum-gpexpand-job-5p7p4       1/1     Running   0          17s
    pod/segment-a-0                           1/1     Running   0          20m
    pod/segment-a-1                           1/1     Running   0          17s
    pod/segment-a-2                           1/1     Running   0          17s
    pod/segment-a-3                           0/1     Pending   0          17s
    pod/segment-a-4                           0/1     Pending   0          17s
    pod/segment-a-5                           0/1     Pending   0          17s
    pod/segment-b-0                           1/1     Running   0          20m
    pod/segment-b-1                           1/1     Running   0          17s
    pod/segment-b-2                           1/1     Running   0          17s
    pod/segment-b-3                           0/1     Pending   0          17s
    pod/segment-b-4                           0/1     Pending   0          17s
    pod/segment-b-5                           0/1     Pending   0          17s
    
    NAME                                                            TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE
    service/agent                                                   ClusterIP      None             <none>        22/TCP           20m
    service/greenplum                                               LoadBalancer   10.105.54.63     <pending>     5432:32036/TCP   20m
    service/greenplum-validating-webhook-service-5cbd87fcb9-5nmfq   ClusterIP      10.104.59.15     <none>        443/TCP          22m
    service/kubernetes                                              ClusterIP      10.96.0.1        <none>        443/TCP          48m
    
    NAME                                 READY   UP-TO-DATE   AVAILABLE   AGE
    deployment.apps/greenplum-operator   1/1     1            1           22m
    
    NAME                                            DESIRED   CURRENT   READY   AGE
    replicaset.apps/greenplum-operator-5cbd87fcb9   1         1         1       22m
    
    NAME                         READY   AGE
    statefulset.apps/master      2/2     20m
    statefulset.apps/segment-a   3/6     20m
    statefulset.apps/segment-b   3/6     20m
    
    NAME                                  COMPLETIONS   DURATION   AGE
    job.batch/my-greenplum-gpexpand-job   0/1           17s        17s
    
    NAME                                                 STATUS    AGE
    greenplumcluster.greenplum.pivotal.io/my-greenplum   Running   20m
    

    In the unlikely case that the update fails, the Greenplum cluster’s status will be UpdateFailed. Should that occur, investigate the logs to see what happened, address the underlying problem, and use kubectl to re-apply the change.


    The expansion process is complete after all pods’ expansion jobs are marked Complete, and job.batch/my-greenplum-gpexpand-job, shows 1/1 Completions. At that point, you can either use the cluster with the new segment resources as-is, or continue with the optional steps below to redistribute data to the new segment pods and/or remove the expansion schema.

  5. (Optional.) If you want to redistribute existing data to use the new segment pods, perform these steps:

    1. Open a bash shell to the Greenplum master pod:

      $ kubectl exec -it master-0 bash
      
      gpadmin@master-0:~$ 
      
    2. Execute the gpexpand command with the -d or -e options to specify the maximum duration or end time after which redistribution stops, respectively. Also include -D gpadmin to indicate that the expansion schema is stored in the gpadmin database. For example, to redistribute tables for a maximum of 10 hours, enter:

      $ gpexpand -d 10:00:00 -D gpadmin
      
      20191113:00:37:11:004168 gpexpand:master-0:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 5.23.0+dev.14.g7f0722e build dev'
      20191113:00:37:11:004168 gpexpand:master-0:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.3.23 (Greenplum Database 5.23.0+dev.14.g7f0722e build dev) on x86_64-pc-linux-gnu, compiled by GCC gcc (Ubuntu 6.5.0-2ubuntu1~16.04) 6.5.0 20181026, 64-bit compiled on Nov  8 2019 02:52:02'
      20191113:00:37:11:004168 gpexpand:master-0:gpadmin-[INFO]:-Querying gpexpand schema for current expansion state
      ...
      

      If you do not specify the -d or -e options, redistribution is performed until all tables in the expansion schema are redistributed. If you specify a duration or end time and redistribution stops before all tables are redistributed, you can continue redistributing tables at a later time.

  6. (Optional.) Remove the expansion schema if you have finished redistributing tables to the new segments, or if you never intend to redistribute tables to the new segments.


    Note: You must remove the expansion schema before you can expand the Greenplum cluster again.

    1. Open a bash shell to the Greenplum master pod:

      $ kubectl exec -it master-0 bash
      
      gpadmin@master-0:~$ 
      
    2. Execute the gpexpand command with the -c option, and enter y when prompted to delete the expansion schema:

      $ gpexpand -c -D gpadmin
      
      20191113:00:40:57:004208 gpexpand:master-0:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 5.23.0+dev.14.g7f0722e build dev'
      20191113:00:40:57:004208 gpexpand:master-0:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.3.23 (Greenplum Database 5.23.0+dev.14.g7f0722e build dev) on x86_64-pc-linux-gnu, compiled by GCC gcc (Ubuntu 6.5.0-2ubuntu1~16.04) 6.5.0 20181026, 64-bit compiled on Nov  8 2019 02:52:02'
      20191113:00:40:57:004208 gpexpand:master-0:gpadmin-[INFO]:-Querying gpexpand schema for current expansion state
      
      Do you want to dump the gpexpand.status_detail table to file? Yy|Nn (default=Y):
      > y
      20191113:00:41:08:004208 gpexpand:master-0:gpadmin-[INFO]:-Dumping gpexpand.status_detail to /greenplum/data-1/gpexpand.status_detail
      20191113:00:41:08:004208 gpexpand:master-0:gpadmin-[INFO]:-Removing gpexpand schema
      20191113:00:41:08:004208 gpexpand:master-0:gpadmin-[INFO]:-Cleanup Finished.  exiting...