Failing Over to a Standby Master

Follow these steps to fail over to a standby master instance in a Greenplum for Kubernetes cluster, should the active master instance fail or if the active master pod is deleted.

  1. If the pod that runs the active Greenplum master instance fails or is deleted, the Greenplum StatefulSet restarts the pod. However, the Greenplum master instance remains offline so you can fail over to the standby master instance. For example, if master-0 currently hosts the active master instance then deleting that pod takes the cluster offline:

    $ kubectl delete pods master-0
    pod "master-0" deleted

    At this point, the StatefulSet ensures that the pod is recreated automatically:

    $ kubectl get pods
    NAME                                  READY     STATUS    RESTARTS   AGE
    greenplum-operator-6456f6cdcf-zvmhr   1/1       Running   0          4d
    master-0                              1/1       Running   0          18s
    master-1                              1/1       Running   0          4d
    segment-a-0                           1/1       Running   0          4d
    segment-b-0                           1/1       Running   0          4d

    However, the cluster remains unavailable:

    $ kubectl exec master-0  -- /bin/bash -c "source /opt/gpdb/; gpstate"
    20181022:17:34:54:000073 gpstate:master-0:gpadmin-[INFO]:-Starting gpstate with args:
    20181022:17:34:55:000073 gpstate:master-0:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 5.11.3 build dev'
    20181022:17:34:56:000073 gpstate:master-0:gpadmin-[CRITICAL]:-gpstate failed. (Reason='could not connect to server: Connection refused
        Is the server running on host "localhost" ( and accepting
        TCP/IP connections on port 5432?
    could not connect to server: Cannot assign requested address
        Is the server running on host "localhost" (::1) and accepting
        TCP/IP connections on port 5432?
    ') exiting...
    command terminated with exit code 2

    The remaining steps in this procedure use master-1 to indicate the standby master instance that is being promoted to operate as the active master instance.

  2. Login to the standby master host and execute gpactivatestandby to activate the host as the standby master:

    $ kubectl exec -it master-1 bash -- -c "source /opt/gpdb/; gpactivatestandby -d /greenplum/data-1  -f"
    20181017:21:39:02:000721 gpactivatestandby:master-1:gpadmin-[INFO]:------------------------------------------------------
    20181017:21:39:02:000721 gpactivatestandby:master-1:gpadmin-[INFO]:-Standby data directory    = /greenplum/data-1
    20181017:21:39:02:000721 gpactivatestandby:master-1:gpadmin-[INFO]:-Standby port              = 5432
    20181017:21:39:02:000721 gpactivatestandby:master-1:gpadmin-[INFO]:-Standby running           = yes
    20181017:21:39:02:000721 gpactivatestandby:master-1:gpadmin-[INFO]:-Force standby activation  = no
    20181017:21:39:02:000721 gpactivatestandby:master-1:gpadmin-[INFO]:------------------------------------------------------
    Do you want to continue with standby master activation? Yy|Nn (default=N):

    Enter Y when prompted to activate the standby master.

  3. After the container named “master-1” becomes master, you will need to use a new external IP address to access the Greenplum service. Execute the command:

    $ kubectl patch service greenplum -p '{"spec":{"selector":{"": "master-1"}}}'
    service "greenplum" patched
  4. At this point, executing gpstate shows that no standby master instance is currently configured:

    $ kubectl exec -it master-1 bash -- -c "source /opt/gpdb/; gpstate"
    20181017:21:51:31:001142 gpstate:master-1:gpadmin-[INFO]:-Starting gpstate with args:
    20181017:21:51:32:001142 gpstate:master-1:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 5.11.3 build dev'
    20181017:21:51:33:001142 gpstate:master-1:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.3.23 (Greenplum Database 5.11.3 build dev) on x86_64-pc-linux-gnu, compiled by    GCC gcc (Ubuntu 6.4.0-17ubuntu1~16.04) 6.4.0 20180424, 64-bit compiled on Oct 10 2018 22:25:23'
    20181017:21:51:33:001142 gpstate:master-1:gpadmin-[INFO]:-Obtaining Segment details from master...
    20181017:21:51:33:001142 gpstate:master-1:gpadmin-[INFO]:-Gathering data from segments...
    20181017:21:51:42:001142 gpstate:master-1:gpadmin-[INFO]:-Greenplum instance status summary
    20181017:21:51:43:001142 gpstate:master-1:gpadmin-[INFO]:-----------------------------------------------------
    20181017:21:51:43:001142 gpstate:master-1:gpadmin-[INFO]:-   Master instance                                           = Active
    20181017:21:51:43:001142 gpstate:master-1:gpadmin-[INFO]:-   Master standby                                            = No master standby configured

    Continue following the remaining steps to initialize the previous master instance as the standby master.

  5. Remove the master data directory from the inactive master instance:

    $ kubectl exec master-0  -- /bin/bash -c 'source /opt/gpdb/; rm -rf ${MASTER_DATA_DIRECTORY}'
  6. On the active master host, execute the following two commands to prepare and initialize the new standby master host:

    $ kubectl exec master-1  -- /bin/bash -c "source /opt/gpdb/; /home/gpadmin/tools/sshKeyScan"
    Key scanning started
    $ kubectl exec master-1  -- /bin/bash -c "source /opt/gpdb/; gpinitstandby -a -s master-0.agent.default.service.cluster.local"
    20181022:17:44:11:000595 gpinitstandby:master-1:gpadmin-[INFO]:-Validating environment and parameters for standby initialization...
    20181022:17:44:12:000595 gpinitstandby:master-1:gpadmin-[INFO]:-Checking for filespace directory /greenplum/data-1 on master-0
    20181022:17:44:13:000595 gpinitstandby:master-1:gpadmin-[INFO]:------------------------------------------------------
    20181022:17:44:13:000595 gpinitstandby:master-1:gpadmin-[INFO]:-Greenplum standby master initialization parameters
    20181022:17:44:13:000595 gpinitstandby:master-1:gpadmin-[INFO]:------------------------------------------------------
    20181022:17:44:13:000595 gpinitstandby:master-1:gpadmin-[INFO]:-Greenplum master hostname               = master-1
    20181022:17:44:13:000595 gpinitstandby:master-1:gpadmin-[INFO]:-Greenplum master data directory         = /greenplum/data-1
    20181022:17:44:13:000595 gpinitstandby:master-1:gpadmin-[INFO]:-Greenplum master port                   = 5432
    20181022:17:44:13:000595 gpinitstandby:master-1:gpadmin-[INFO]:-Greenplum standby master hostname       = master-0
    20181022:17:44:13:000595 gpinitstandby:master-1:gpadmin-[INFO]:-Greenplum standby master port           = 5432
    20181022:17:44:13:000595 gpinitstandby:master-1:gpadmin-[INFO]:-Greenplum standby master data directory = /greenplum/data-1
    20181022:17:44:13:000595 gpinitstandby:master-1:gpadmin-[INFO]:-Greenplum update system catalog         = On
  7. At this point the active master runs on the pod named “master-1” and the standby master runs on the pod named “master-0.” Verify the role of each segment host:

    $ kubectl exec -it master-1 bash -- -c "source /opt/gpdb/; psql -c 'select hostname, role from gp_segment_configuration;'"
      hostname   | role
     segment-a-0 | p
     segment-b-0 | m
     master-1    | p
     master-0    | m
    (4 rows)