Using Greenplum for Kubernetes

Follow the instructions in this section to perform common tasks with a running Greenplum for Kubernetes system.

Accessing a Pod via SSH

Use the kubectl tool to access a pod running in the Greenplum for Kubernetes cluster:

$ kubectl exec --it <pod-name> -- /bin/bash

For example:

$ kubectl exec -it master -- /bin/bash

Failing Over to a Standby Master

Follow these steps to failover to a standby master with the Greenplum for Kubernetes service:

  1. Make sure the master has been turned off, either by killing the pod or by running gpstop on master.

  2. Login to the standby master with command:

    $ kubectl exec -it standby /bin/bash
    
  3. Set the PGPORT environment variable, then activate the standby master:

    $ export PGPORT=5432
    $ gpactivatestandby -d /greenplum/data-1  -f
    
  4. After the container named “standby” becomes master, you will need to use a new external IP address to access the Greenplum service. Execute the command:

    $ kubectl patch -f ./localbuild/templates/greenplum-service.yaml -p '{"spec":{"selector":{"type":"standby"}}}'
    

Deleting the Greenplum Database Cluster

When you are finished using the Greenplum Database cluster, use this command to remove it from Kubernetes:

helm del --purge <greenplum_cluster_name>

If you do not remember the original <greenplum_cluster_name> you can display it by using the helm list command. For example:

$ helm list
NAME    REVISION    UPDATED                     STATUS      CHART       NAMESPACE
gpdb    1           Thu Jul 19 13:14:20 2018    DEPLOYED    greenplum-1 default
$ helm del --purge gpdb
release "gpdb" deleted

The above command removes the specified cluster, but leaves the nodes running. The --purge option makes the <greenplum_cluster_name> available for you to use in a later deployment.

Keep in mind that PKS cannot resize a cluster to a lower number of nodes; you must delete and re-create the cluster to reduce the number of nodes.

Troubleshooting Pivotal Greenplum for Kubernetes

Could Not Find Tiller

Symptom:

Error: could not find tiller

Resolution:

# remove any existing helm installation and re-install helm with the prerequisite privileges
helm reset
kubectl create serviceaccount --namespace kube-system tiller
kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=kube-system:tiller
kubectl patch deploy --namespace kube-system tiller-deploy -p '{"spec":{"template":{"spec":{"serviceAccount":"tiller"}}}}' 
helm init --service-account tiller --upgrade --wait

Node Not Labeled

Symptom:

node "gke-gpdb-test-default-pool-20a900ca-3trh" not labeled

Resolution:

This is a common output from GCP. It indicates that the node is already labeled correctly, so no label action was necessary.

Forbidden Namespace or Unknown User

Symptom:

namespaces "default" is forbidden: User "system:serviceaccount:kube-system:default" cannot get namespaces in the namespace "default": Unknown user "system:serviceaccount:kube-system:default"

Resolution:

This message indicates that the Kubernetes system is v1.8 or greater, which enables role-based access. (See this bug report and commit.)

Helm requires additional permissions (more than the default level) in v1.8 or greater. Execute these commands:

kubectl create serviceaccount --namespace kube-system tiller
kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=kube-system:tiller
kubectl patch deploy --namespace kube-system tiller-deploy -p '{"spec":{"template":{"spec":{"serviceAccount":"tiller"}}}}'      
helm init --service-account tiller --upgrade --wait

Executable Not Found

Symptom:

executable not found in $PATH

Resolution:

This error appears on the events tab of a container to indicate that xfs is not supported on Container OS (COS). To resolve this, use Ubuntu OS on the node.

Sandbox Has Changed

Symptom:

Sandbox has changed

Resolution:

This error appears on the events tab of a container to indicate that sysctl settings have become corrupted or failed. To resolve this, remove the sysctl settings from pod YAML file.

Connection Timed

Symptom:

 getsocket() connection timed 

Resolution:

This error can occur when accessing http://localhost:8001/ui, a kubectl proxy address. Make sure that there is a connection between the master and worker nodes where kubernetes-dashboard is running. A network tag on the nodes like gpcloud-internal can establish a route among the nodes.

Unable to Connect to the Server

Symptom:

kubectl get nodes -w
Unable to connect to the server: x509: certificate is valid for 10.100.200.1, 35.199.191.209, not 35.197.83.225

Resolution:

This error indicates that you have chosen to update the wrong Load Balancer. Each cluster has its own load balancer for the Kubernetes master, with a certificate for access. Refer to the workspace/samples/scripts/create_pks_cluster_on_gcp.bash script for Bash commands that help to determine the master IP address for a given cluster name, and the commands used to attach to a Load Balancer.