Troubleshooting Common Problems
Read-Only File System Error
Symptom:
The command kubectl logs <pod-name>
shows the error:
install: cannot create directory '/sys/fs/cgroup/devices/kubepods': Read-only file system
Resolution:
The Greenplum for Kubernetes deployment process requires the ability to map the host system’s /sys/fs/cgroup
directory onto each container’s /sys/fs/cgroup
. Ensure that no kernel security module (for example, AppArmor) uses a profile that disallows mounting /sys/fs/cgroup
.
Could Not Find Tiller
Symptom:
Error: could not find tiller
Resolution:
Remove any existing helm
installation and re-install helm
with sufficient privileges. The initialize_helm_rbac.yaml
is available in the top-level directory of the Greenplum for Kubernetes software distribution:
$ helm reset
$ kubectl create -f ./initialize_helm_rbac.yaml
$ helm init --service-account tiller --upgrade --wait
Node Not Labeled
Symptom:
node "gke-gpdb-test-default-pool-20a900ca-3trh" not labeled
Resolution:
This is a common output from GCP. It indicates that the node is already labeled correctly, so no label action was necessary.
Forbidden Namespace or Unknown User
Symptom:
namespaces "default" is forbidden: User "system:serviceaccount:kube-system:default" cannot get namespaces in the namespace "default": Unknown user "system:serviceaccount:kube-system:default"
Resolution:
This message indicates that the Kubernetes system is v1.8 or greater, which enables role-based access. (See this bug report and commit.)
Helm requires additional permissions (more than the default level) in v1.8 or greater. Execute these commands:
kubectl create serviceaccount --namespace kube-system tiller
kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=kube-system:tiller
kubectl patch deploy --namespace kube-system tiller-deploy -p '{"spec":{"template":{"spec":{"serviceAccount":"tiller"}}}}'
helm init --service-account tiller --upgrade --wait
Executable Not Found
Symptom:
executable not found in $PATH
Resolution:
This error appears on the events tab of a container to indicate that xfs is not supported on Container OS (COS). To resolve this, use Ubuntu OS on the node.
Sandbox Has Changed
Symptom:
Sandbox has changed
Resolution:
This error appears on the events tab of a container to indicate that sysctl
settings have become corrupted or failed. To resolve this, remove the sysctl
settings from pod YAML file.
Connection Timed
Symptom:
getsocket() connection timed
Resolution:
This error can occur when accessing http://localhost:8001/ui, a kubectl
proxy address. Make sure that there is a connection between the master and worker nodes where kubernetes-dashboard is running. A network tag on the nodes like gpcloud-internal
can establish a route among the nodes.
Unable to Connect to the Server
Symptom:
kubectl get nodes -w
Unable to connect to the server: x509: certificate is valid for 10.100.200.1, 35.199.191.209, not 35.197.83.225
Resolution:
This error indicates that you have chosen to update the wrong Load Balancer. Each cluster has its own load balancer for the Kubernetes master, with a certificate for access. Refer to the workspace/samples/scripts/create_pks_cluster_on_gcp.bash
script for Bash commands that help to determine the master IP address for a given cluster name, and the commands used to attach to a Load Balancer.
Permission Denied Error when Stopping Greenplum
Symptom:
kubectl exec -it master bash
$ gpstop -u
#
# Exits with Permission denied error
#
.
20180828:14:34:53:002448 gpstop:master:gpadmin-[CRITICAL]:-Error occurred: Error Executing Command:
Command was: 'ssh -o 'StrictHostKeyChecking no' master ". /opt/gpdb/greenplum_path.sh; $GPHOME/bin/pg_ctl reload -D /greenplum/data-1"'
rc=1, stdout='', stderr='pg_ctl: could not send reload signal (PID: 2137): Permission denied'
Resolution:
This error occurs because of the ssh context that Docker uses. Commands that are issued to a process have to use the same context as the originator of the process. This issue is fixed in recent Docker versions, but the fixes have not reached the latest kubernetes release. To avoid this issue, use the same ssh context that you used to initialize the Greenplum cluster. For example, if you used a kubectl
session to initialize Greenplum, then use another kubectl
session to run gpstop
and stop Greenplum.