Using Greenplum for Kubernetes

Follow the instructions in this section to perform common tasks with a running Greenplum for Kubernetes system.

Accessing a Pod via SSH

Use the kubectl tool to access a pod running in the Greenplum for Kubernetes cluster:

$ kubectl exec --it <pod-name> -- /bin/bash

For example:

$ kubectl exec -it master -- /bin/bash

Failing Over to a Standby Master

Follow these steps to failover to a standby master with the Greenplum for Kubernetes service:

  1. Make sure the master has been turned off, either by killing the pod or by running gpstop on master.

  2. Login to the standby master with command:

    $ kubectl exec -it standby /bin/bash
    
  3. Activate the standby master:

    $ gpactivatestandby -d /greenplum/data-1  -f
    
  4. After the container named “standby” becomes master, you will need to use a new external IP address to access the Greenplum service. Execute the command:

    $ kubectl patch service greenplum -p '{"spec":{"selector":{"type":"standby"}}}'
    
  5. Re-use the provided master.yaml file to create a pod that will become the mirror of the new master database:

    $ kubectl create -f ./localbuild/templates/master.yaml
    $ kubectl wait --for=condition=running pod/master
    $ kubectl exec master  -- /bin/bash -c 'rm -rf ${MASTER_DATA_DIRECTORY}' # remove any previous data
    
  6. Prepare the new master database (the pod called “standby”) to have a new mirror database (the pod called “master”):

    $ kubectl exec standby  -- /bin/bash -c "source /opt/gpdb/greenplum_path.sh; /home/gpadmin/tools/sshKeyScan" # exchange keys
    $ kubectl exec standby  -- /bin/bash -c "source /opt/gpdb/greenplum_path.sh; gpinitstandby -a -s master.agent.<namespace>.svc.cluster.local"
    
  7. At this point the current master is the pod named “standby” and the current standby is the pod named “master.”

Deleting the Greenplum Database Cluster

When you are finished using the Greenplum Database cluster, use this command to remove it from Kubernetes:

helm del --purge <greenplum_cluster_name>

If you do not remember the original <greenplum_cluster_name> you can display it by using the helm list command. For example:

$ helm list
NAME    REVISION    UPDATED                     STATUS      CHART       NAMESPACE
gpdb    1           Thu Jul 19 13:14:20 2018    DEPLOYED    greenplum-1 default
$ helm del --purge gpdb
release "gpdb" deleted

The above command removes the specified cluster, but leaves the nodes running. The --purge option makes the <greenplum_cluster_name> available for you to use in a later deployment.

Use the kubectl get pods command to check the progress of the operation; you cannot re-use the cluster name until helm finishes deleting the cluster.

Troubleshooting Pivotal Greenplum for Kubernetes

Read-Only File System Error

Symptom:

The command kubectl logs <pod-name> shows the error:

install: cannot create directory '/sys/fs/cgroup/devices/kubepods': Read-only file system

Resolution:

The Greenplum for Kubernetes deployment process requires the ability to map the host system’s /sys/fs/cgroup directory onto each container’s /sys/fs/cgroup. Ensure that no kernel security module (for example, AppArmor) uses a profile that disallows mounting /sys/fs/cgroup.

Could Not Find Tiller

Symptom:

Error: could not find tiller

Resolution:

# remove any existing helm installation and re-install helm with the prerequisite privileges
helm reset
kubectl create serviceaccount --namespace kube-system tiller
kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=kube-system:tiller
kubectl patch deploy --namespace kube-system tiller-deploy -p '{"spec":{"template":{"spec":{"serviceAccount":"tiller"}}}}' 
helm init --service-account tiller --upgrade --wait

Node Not Labeled

Symptom:

node "gke-gpdb-test-default-pool-20a900ca-3trh" not labeled

Resolution:

This is a common output from GCP. It indicates that the node is already labeled correctly, so no label action was necessary.

Forbidden Namespace or Unknown User

Symptom:

namespaces "default" is forbidden: User "system:serviceaccount:kube-system:default" cannot get namespaces in the namespace "default": Unknown user "system:serviceaccount:kube-system:default"

Resolution:

This message indicates that the Kubernetes system is v1.8 or greater, which enables role-based access. (See this bug report and commit.)

Helm requires additional permissions (more than the default level) in v1.8 or greater. Execute these commands:

kubectl create serviceaccount --namespace kube-system tiller
kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=kube-system:tiller
kubectl patch deploy --namespace kube-system tiller-deploy -p '{"spec":{"template":{"spec":{"serviceAccount":"tiller"}}}}'      
helm init --service-account tiller --upgrade --wait

Executable Not Found

Symptom:

executable not found in $PATH

Resolution:

This error appears on the events tab of a container to indicate that xfs is not supported on Container OS (COS). To resolve this, use Ubuntu OS on the node.

Sandbox Has Changed

Symptom:

Sandbox has changed

Resolution:

This error appears on the events tab of a container to indicate that sysctl settings have become corrupted or failed. To resolve this, remove the sysctl settings from pod YAML file.

Connection Timed

Symptom:

 getsocket() connection timed 

Resolution:

This error can occur when accessing http://localhost:8001/ui, a kubectl proxy address. Make sure that there is a connection between the master and worker nodes where kubernetes-dashboard is running. A network tag on the nodes like gpcloud-internal can establish a route among the nodes.

Unable to Connect to the Server

Symptom:

kubectl get nodes -w
Unable to connect to the server: x509: certificate is valid for 10.100.200.1, 35.199.191.209, not 35.197.83.225

Resolution:

This error indicates that you have chosen to update the wrong Load Balancer. Each cluster has its own load balancer for the Kubernetes master, with a certificate for access. Refer to the workspace/samples/scripts/create_pks_cluster_on_gcp.bash script for Bash commands that help to determine the master IP address for a given cluster name, and the commands used to attach to a Load Balancer.

Permission Denied Error when Stopping Greenplum

Symptom:

kubectl exec -it master bash
$ gpstop -u

#
# Exits with Permission denied error
#
.
20180828:14:34:53:002448 gpstop:master:gpadmin-[CRITICAL]:-Error occurred: Error Executing Command:
 Command was: 'ssh -o 'StrictHostKeyChecking no' master ". /opt/gpdb/greenplum_path.sh; $GPHOME/bin/pg_ctl reload -D /greenplum/data-1"'
rc=1, stdout='', stderr='pg_ctl: could not send reload signal (PID: 2137): Permission denied'

Resolution:

This error occurs because of the ssh context that Docker uses. Commands that are issued to a process have to use the same context as the originator of the process. This issue is fixed in recent Docker versions, but the fixes have not reached the latest kubernetes release. To avoid this issue, use the same ssh context that you used to initialize the Greenplum cluster. For example, if you used a kubectl session to initialize Greenplum, then use another kubectl session to run gpstop and stop Greenplum.