LATEST VERSION: 0.8 - RELEASE NOTES
Pivotal Greenplum® for Kubernetes v0.8

Troubleshooting Common Problems

Read-Only File System Error

Symptom:

The command kubectl logs <pod-name> shows the error:

install: cannot create directory '/sys/fs/cgroup/devices/kubepods': Read-only file system

Resolution:

The Greenplum for Kubernetes deployment process requires the ability to map the host system’s /sys/fs/cgroup directory onto each container’s /sys/fs/cgroup. Ensure that no kernel security module (for example, AppArmor) uses a profile that disallows mounting /sys/fs/cgroup.

Could Not Find Tiller

Symptom:

Error: could not find tiller

Resolution:

Remove any existing helm installation and re-install helm with sufficient privileges. The initialize_helm_rbac.yaml is available in the top-level directory of the Greenplum for Kubernetes software distribution:

$ helm reset
$ kubectl create -f ./initialize_helm_rbac.yaml 
$ helm init --service-account tiller --upgrade --wait

Node Not Labeled

Symptom:

node "gke-gpdb-test-default-pool-20a900ca-3trh" not labeled

Resolution:

This is a common output from GCP. It indicates that the node is already labeled correctly, so no label action was necessary.

Forbidden Namespace or Unknown User

Symptom:

namespaces "default" is forbidden: User "system:serviceaccount:kube-system:default" cannot get namespaces in the namespace "default": Unknown user "system:serviceaccount:kube-system:default"

Resolution:

This message indicates that the Kubernetes system is v1.8 or greater, which enables role-based access. (See this bug report and commit.)

Helm requires additional permissions (more than the default level) in v1.8 or greater. Execute these commands:

kubectl create serviceaccount --namespace kube-system tiller
kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=kube-system:tiller
kubectl patch deploy --namespace kube-system tiller-deploy -p '{"spec":{"template":{"spec":{"serviceAccount":"tiller"}}}}'      
helm init --service-account tiller --upgrade --wait

Executable Not Found

Symptom:

executable not found in $PATH

Resolution:

This error appears on the events tab of a container to indicate that xfs is not supported on Container OS (COS). To resolve this, use Ubuntu OS on the node.

Sandbox Has Changed

Symptom:

Sandbox has changed

Resolution:

This error appears on the events tab of a container to indicate that sysctl settings have become corrupted or failed. To resolve this, remove the sysctl settings from pod YAML file.

Connection Timed

Symptom:

 getsocket() connection timed 

Resolution:

This error can occur when accessing http://localhost:8001/ui, a kubectl proxy address. Make sure that there is a connection between the master and worker nodes where kubernetes-dashboard is running. A network tag on the nodes like gpcloud-internal can establish a route among the nodes.

Unable to Connect to the Server

Symptom:

kubectl get nodes -w
Unable to connect to the server: x509: certificate is valid for 10.100.200.1, 35.199.191.209, not 35.197.83.225

Resolution:

This error indicates that you have chosen to update the wrong Load Balancer. Each cluster has its own load balancer for the Kubernetes master, with a certificate for access. Refer to the workspace/samples/scripts/create_pks_cluster_on_gcp.bash script for Bash commands that help to determine the master IP address for a given cluster name, and the commands used to attach to a Load Balancer.

Permission Denied Error when Stopping Greenplum

Symptom:

kubectl exec -it master bash
$ gpstop -u

#
# Exits with Permission denied error
#
.
20180828:14:34:53:002448 gpstop:master:gpadmin-[CRITICAL]:-Error occurred: Error Executing Command:
 Command was: 'ssh -o 'StrictHostKeyChecking no' master ". /opt/gpdb/greenplum_path.sh; $GPHOME/bin/pg_ctl reload -D /greenplum/data-1"'
rc=1, stdout='', stderr='pg_ctl: could not send reload signal (PID: 2137): Permission denied'

Resolution:

This error occurs because of the ssh context that Docker uses. Commands that are issued to a process have to use the same context as the originator of the process. This issue is fixed in recent Docker versions, but the fixes have not reached the latest kubernetes release. To avoid this issue, use the same ssh context that you used to initialize the Greenplum cluster. For example, if you used a kubectl session to initialize Greenplum, then use another kubectl session to run gpstop and stop Greenplum.

Socket: too many open files

Symptom: Executing any kubectl command yields an error similar to: dial udp 1.2.3.4:53: socket: too many open files

Resolution:

Configure the underlying node to support a larger number of files. See Files in the Node Requirements documentation.

PKS Deployment Errors

Greenplum Query Fails to Write an Outgoing Packet

Symptom:

The Greenplum cluster is initialized and running, but a query returns an error similar to:

ERROR: Interconnect error writing an outgoing packet: Invalid argument (seg0 slice1 <ip>:<port> pid=1193)

Resolution:

This error occurs when ports are not garbage collected quickly enough. The problem is commonplace in systems that have many containers on a single kubernetes node, and the containers heavily use different ports to communicate with one another (as is the case with Greenplum segments).

To work around this problem, set the following sysctl attribute on the worker nodes:

net.ipv4.neigh.default.gc_thresh1 = 30000

Authorization Errors

Symptom

After certificate change (after any URL change for UAA domain name), you may see a 401 error from BOSH similar to:

bosh -e pks vms
Using environment '192.168.101.10' as anonymous user

Finding deployments:
  Director responded with non-successful status code '401' response 'Not authorized: '/deployments'
'

Exit code 1

Resolution

Go to the credentials web page (similar to https://<ops manager>/infrastructure/director/credentials) and look for Bosh command line credentials. The credentials look similar to:

{"credential":"BOSH_CLIENT=<Some User> BOSH_CLIENT_SECRET=<Some Secret> BOSH_CA_CERT=/var/tempest/workspaces/default/root_ca_certificate BOSH_ENVIRONMENT=192.168.101.10 bosh "}

In the command for uaac token owner get, use: * BOSH_CLIENT as the “Client ID” * BOSH_CLIENT_SECRET as “Client secret” * “Admin” for User name * associated password from Uaa Admin User Credentials

For example:

$ uaac token owner get
Client ID:  ops_manager
Client secret:  ********************************
User name:  Admin
Password:  ********************************

Cannot Access UAA

Symptom

You can access Ops Manager, but you have problems accessing UAA. For example:

pks login -a https://pks-0.gpcloud.gpdb.pivotal.io:9021 -u dummy -p <some password> -k

Error: Post https://pks-0.gpcloud.gpdb.pivotal.io:8443/oauth/token: dial tcp 35.197.67.138:8443: getsockopt: connection refused

Resolution

This problem can be a symptom of having recycled the VM running the PKS API, such that the external IP address defined in the domain name is old. Use gcloud or Google Cloud Console to determine the current VM for the PKS API. You can distinguish it because it has two labels, “job” and “instance-group”, which both have values “pivotal container service”. Get the external IP address for this and change the DNS definition to that external IP address. Use a command like:

$ watch dig '<my domain name>'

Wait to see that the DNS entry is updated for your local workstation. When it updates, try the pks login command again.

Unexpected End of JSON Input

Symptom

You see the following error when you try to deploy the Greenplum Operator using helm:

$ helm install --name greenplum-operator -f workspace/operator-values-overrides.yaml operator/
Error: release greenplum-operator failed: Secret "regsecret" is invalid: data[.dockerconfigjson]: Invalid value: "<secret contents redacted>": unexpected end of JSON input

Resolution

This error indicates that the value specified for dockerRegistryKeyJson in ./workspace/operator-values-overrides.yaml is invalid or missing.

In order to download Greenplum images from a container image registry such as gcr.io, a key.json file is required to provide the authentication secrets. Make sure that the key.json file is in the correct location under the /operator directory, as described in the installation procedure.

ImagePullBackOff Error While Deploying Operator

Symptom

When you try to deploy the Greenplum Operator using helm you see an error similar to:

$ helm install --name greenplum-operator -f workspace/operator-values-overrides.yaml operator/
$ kubectl describe pod -l name=greenplum-operator
...
Events:
  Type     Reason          Age              From                                 Message
  ----     ------          ----             ----                                 -------
  Normal   Scheduled       1m               default-scheduler                    Successfully assigned default/greenplum-operator-79bd8ccbc4-4lbxx to gke-oz-acceptance-default-pool-c7870f59-6h3f
  Normal   Pulling         1m (x2 over 1m)  kubelet, default-pool-c7870f59-6h3f  pulling image "greenplum-operator:v0.6.0.dev.103.gadfb9a1"
  Warning  Failed          1m (x2 over 1m)  kubelet, default-pool-c7870f59-6h3f  Failed to pull image "greenplum-operator:v0.6.0.dev.103.gadfb9a1": rpc error: code = Unknown desc = Error response from daemon: repository greenplum-operator not found: does not exist or no pull access
  Warning  Failed          1m (x2 over 1m)  kubelet, default-pool-c7870f59-6h3f  Error: ErrImagePull
  Normal   SandboxChanged  1m (x7 over 1m)  kubelet, default-pool-c7870f59-6h3f  Pod sandbox changed, it will be killed and re-created.
  Normal   BackOff         1m (x6 over 1m)  kubelet, default-pool-c7870f59-6h3f  Back-off pulling image "greenplum-operator:v0.6.0.dev.103.gadfb9a1"
  Warning  Failed          1m (x6 over 1m)  kubelet, default-pool-c7870f59-6h3f  Error: ImagePullBackOff

Resolution

This error indicates that the value you specified for dockerRegistryKeyJson in ./workspace/operator-values-overrides.yaml points to a service account key that does not have permission to pull images from the specified container registry.

To download Greenplum images from a container image registry such as gcr.io, a key.json file is required to provide the authentication secrets. Make sure that the key.json file contains a valid service account key with permission to pull images from the container registry as described Cluster Requirements (for GKE on GCP) or Cluster Requirements (for PKS on GCP)