Troubleshooting Common Problems

Read-Only File System Error

Symptom:

The command kubectl logs <pod-name> shows the error:

install: cannot create directory '/sys/fs/cgroup/devices/kubepods': Read-only file system

Resolution:

The Greenplum for Kubernetes deployment process requires the ability to map the host system’s /sys/fs/cgroup directory onto each container’s /sys/fs/cgroup. Ensure that no kernel security module (for example, AppArmor) uses a profile that disallows mounting /sys/fs/cgroup.

Could Not Find Tiller

Symptom:

Error: could not find tiller

Resolution:

Remove any existing helm installation and re-install helm with sufficient privileges. The initialize_helm_rbac.yaml is available in the top-level directory of the Greenplum for Kubernetes software distribution:

$ helm reset
$ kubectl create -f ./initialize_helm_rbac.yaml 
$ helm init --service-account tiller --upgrade --wait

Node Not Labeled

Symptom:

node "gke-gpdb-test-default-pool-20a900ca-3trh" not labeled

Resolution:

This is a common output from GCP. It indicates that the node is already labeled correctly, so no label action was necessary.

Forbidden Namespace or Unknown User

Symptom:

namespaces "default" is forbidden: User "system:serviceaccount:kube-system:default" cannot get namespaces in the namespace "default": Unknown user "system:serviceaccount:kube-system:default"

Resolution:

This message indicates that the Kubernetes system is v1.8 or greater, which enables role-based access. (See this bug report and commit.)

Helm requires additional permissions (more than the default level) in v1.8 or greater. Execute these commands:

kubectl create serviceaccount --namespace kube-system tiller
kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=kube-system:tiller
kubectl patch deploy --namespace kube-system tiller-deploy -p '{"spec":{"template":{"spec":{"serviceAccount":"tiller"}}}}'      
helm init --service-account tiller --upgrade --wait

Executable Not Found

Symptom:

executable not found in $PATH

Resolution:

This error appears on the events tab of a container to indicate that xfs is not supported on Container OS (COS). To resolve this, use Ubuntu OS on the node.

Sandbox Has Changed

Symptom:

Sandbox has changed

Resolution:

This error appears on the events tab of a container to indicate that sysctl settings have become corrupted or failed. To resolve this, remove the sysctl settings from pod YAML file.

Connection Timed

Symptom:

 getsocket() connection timed 

Resolution:

This error can occur when accessing http://localhost:8001/ui, a kubectl proxy address. Make sure that there is a connection between the master and worker nodes where kubernetes-dashboard is running. A network tag on the nodes like gpcloud-internal can establish a route among the nodes.

Unable to Connect to the Server

Symptom:

kubectl get nodes -w
Unable to connect to the server: x509: certificate is valid for 10.100.200.1, 35.199.191.209, not 35.197.83.225

Resolution:

This error indicates that you have chosen to update the wrong Load Balancer. Each cluster has its own load balancer for the Kubernetes master, with a certificate for access. Refer to the workspace/samples/scripts/create_pks_cluster_on_gcp.bash script for Bash commands that help to determine the master IP address for a given cluster name, and the commands used to attach to a Load Balancer.

Permission Denied Error when Stopping Greenplum

Symptom:

kubectl exec -it master bash
$ gpstop -u

#
# Exits with Permission denied error
#
.
20180828:14:34:53:002448 gpstop:master:gpadmin-[CRITICAL]:-Error occurred: Error Executing Command:
 Command was: 'ssh -o 'StrictHostKeyChecking no' master ". /opt/gpdb/greenplum_path.sh; $GPHOME/bin/pg_ctl reload -D /greenplum/data-1"'
rc=1, stdout='', stderr='pg_ctl: could not send reload signal (PID: 2137): Permission denied'

Resolution:

This error occurs because of the ssh context that Docker uses. Commands that are issued to a process have to use the same context as the originator of the process. This issue is fixed in recent Docker versions, but the fixes have not reached the latest kubernetes release. To avoid this issue, use the same ssh context that you used to initialize the Greenplum cluster. For example, if you used a kubectl session to initialize Greenplum, then use another kubectl session to run gpstop and stop Greenplum.