Troubleshooting Common Problems

Enabling Debug Logging

By default, Greenplum for Kubernetes logs info level messages. You can obtain more detailed log messages for certain problems by changing the log level to debug. Note that changes to the logging level must be applied before the Greenplum Operator is installed.

To change the log level:

  1. Go to the operator subdirectory of your Greenplum for Kubernetes software directory. For example:

    $ cd ~/greenplum-for-kubernetes-*/operator
  2. Open the values.yaml file in a text editor.

  3. To change the default log level to debug, add the following line to the end of the file:

    logLevel: debug

    To revert to default logging, either remove this line or change it to read logLevel: info

  4. Install the Greenplum Operator to use the new logging level.

Read-Only File System Error


The command kubectl logs <pod-name> shows the error:

install: cannot create directory '/sys/fs/cgroup/devices/kubepods': Read-only file system


The Greenplum for Kubernetes deployment process requires the ability to map the host system’s /sys/fs/cgroup directory onto each container’s /sys/fs/cgroup. Ensure that no kernel security module (for example, AppArmor) uses a profile that disallows mounting /sys/fs/cgroup.

Could Not Find Tiller


Error: could not find tiller


Remove any existing helm installation and re-install helm with sufficient privileges. The initialize_helm_rbac.yaml is available in the top-level directory of the Greenplum for Kubernetes software distribution:

$ helm reset
$ kubectl create -f ./initialize_helm_rbac.yaml 
$ helm init --service-account tiller --upgrade --wait

Node Not Labeled


node "gke-gpdb-test-default-pool-20a900ca-3trh" not labeled


This is a common output from GCP. It indicates that the node is already labeled correctly, so no label action was necessary.

Forbidden Namespace or Unknown User


namespaces "default" is forbidden: User "system:serviceaccount:kube-system:default" cannot get namespaces in the namespace "default": Unknown user "system:serviceaccount:kube-system:default"


This message indicates that the Kubernetes system is v1.8 or greater, which enables role-based access. (See this bug report and commit.)

Helm requires additional permissions (more than the default level) in v1.8 or greater. Execute these commands:

kubectl create serviceaccount --namespace kube-system tiller
kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=kube-system:tiller
kubectl patch deploy --namespace kube-system tiller-deploy -p '{"spec":{"template":{"spec":{"serviceAccount":"tiller"}}}}'      
helm init --service-account tiller --upgrade --wait

Executable Not Found


executable not found in $PATH


This error appears on the events tab of a container to indicate that xfs is not supported on Container OS (COS). To resolve this, use Ubuntu OS on the node.

Sandbox Has Changed


Sandbox has changed


This error appears on the events tab of a container to indicate that sysctl settings have become corrupted or failed. To resolve this, remove the sysctl settings from pod YAML file.

Connection Timed


 getsocket() connection timed 


This error can occur when accessing http://localhost:8001/ui, a kubectl proxy address. Make sure that there is a connection between the master and worker nodes where kubernetes-dashboard is running. A network tag on the nodes like gpcloud-internal can establish a route among the nodes.

Unable to Connect to the Server


kubectl get nodes -w
Unable to connect to the server: x509: certificate is valid for,, not


This error indicates that you have chosen to update the wrong Load Balancer. Each cluster has its own load balancer for the Kubernetes master, with a certificate for access. Refer to the workspace/samples/scripts/create_pks_cluster_on_gcp.bash script for Bash commands that help to determine the master IP address for a given cluster name, and the commands used to attach to a Load Balancer.

Permission Denied Error when Stopping Greenplum


kubectl exec -it master bash
$ gpstop -u

# Exits with Permission denied error
20180828:14:34:53:002448 gpstop:master:gpadmin-[CRITICAL]:-Error occurred: Error Executing Command:
 Command was: 'ssh -o 'StrictHostKeyChecking no' master ". /opt/gpdb/; $GPHOME/bin/pg_ctl reload -D /greenplum/data-1"'
rc=1, stdout='', stderr='pg_ctl: could not send reload signal (PID: 2137): Permission denied'


This error occurs because of the ssh context that Docker uses. Commands that are issued to a process have to use the same context as the originator of the process. This issue is fixed in recent Docker versions, but the fixes have not reached the latest kubernetes release. To avoid this issue, use the same ssh context that you used to initialize the Greenplum cluster. For example, if you used a kubectl session to initialize Greenplum, then use another kubectl session to run gpstop and stop Greenplum.

Socket: too many open files

Symptom: Executing any kubectl command yields an error similar to: dial udp socket: too many open files


Configure the underlying node to support a larger number of files. See Files in the Node Requirements documentation.

PKS Deployment Errors

Greenplum Query Fails to Write an Outgoing Packet


The Greenplum cluster is initialized and running, but a query returns an error similar to:

ERROR: Interconnect error writing an outgoing packet: Invalid argument (seg0 slice1 <ip>:<port> pid=1193)


This error occurs when ports are not garbage collected quickly enough. The problem is commonplace in systems that have many containers on a single kubernetes node, and the containers heavily use different ports to communicate with one another (as is the case with Greenplum segments).

To work around this problem, set the following sysctl attribute on the worker nodes:

net.ipv4.neigh.default.gc_thresh1 = 30000

Authorization Errors


After certificate change (after any URL change for UAA domain name), you may see a 401 error from BOSH similar to:

bosh -e pks vms
Using environment '' as anonymous user

Finding deployments:
  Director responded with non-successful status code '401' response 'Not authorized: '/deployments'

Exit code 1


Go to the credentials web page (similar to https://<ops manager>/infrastructure/director/credentials) and look for Bosh command line credentials. The credentials look similar to:

{"credential":"BOSH_CLIENT=<Some User> BOSH_CLIENT_SECRET=<Some Secret> BOSH_CA_CERT=/var/tempest/workspaces/default/root_ca_certificate BOSH_ENVIRONMENT= bosh "}

In the command for uaac token owner get, use: * BOSH_CLIENT as the “Client ID” * BOSH_CLIENT_SECRET as “Client secret” * “Admin” for User name * associated password from Uaa Admin User Credentials

For example:

$ uaac token owner get
Client ID:  ops_manager
Client secret:  ********************************
User name:  Admin
Password:  ********************************

Cannot Access UAA


You can access Ops Manager, but you have problems accessing UAA. For example:

pks login -a -u dummy -p <some password> -k

Error: Post dial tcp getsockopt: connection refused


This problem can be a symptom of having recycled the VM running the PKS API, such that the external IP address defined in the domain name is old. Use gcloud or Google Cloud Console to determine the current VM for the PKS API. You can distinguish it because it has two labels, “job” and “instance-group”, which both have values “pivotal container service”. Get the external IP address for this and change the DNS definition to that external IP address. Use a command like:

$ watch dig '<my domain name>'

Wait to see that the DNS entry is updated for your local workstation. When it updates, try the pks login command again.

Cannot pull images


When you try to deploy the Greenplum Operator using helm you see an error similar to:

$ helm install --name greenplum-operator -f workspace/operator-values-overrides.yaml operator/
$ kubectl describe pod -l app=greenplum-operator
  Type     Reason          Age              From                                 Message
  ----     ------          ----             ----                                 -------
  Normal   Scheduled       1m               default-scheduler                    Successfully assigned default/greenplum-operator-79bd8ccbc4-4lbxx to gke-oz-acceptance-default-pool-c7870f59-6h3f
  Normal   Pulling         1m (x2 over 1m)  kubelet, default-pool-c7870f59-6h3f  pulling image ""
  Warning  Failed          1m (x2 over 1m)  kubelet, default-pool-c7870f59-6h3f  Failed to pull image "": rpc error: code = Unknown desc = Error response from daemon: repository greenplum-operator not found: does not exist or no pull access
  Warning  Failed          1m (x2 over 1m)  kubelet, default-pool-c7870f59-6h3f  Error: ErrImagePull
  Normal   SandboxChanged  1m (x7 over 1m)  kubelet, default-pool-c7870f59-6h3f  Pod sandbox changed, it will be killed and re-created.
  Normal   BackOff         1m (x6 over 1m)  kubelet, default-pool-c7870f59-6h3f  Back-off pulling image ""
  Warning  Failed          1m (x6 over 1m)  kubelet, default-pool-c7870f59-6h3f  Error: ImagePullBackOff


This error indicates that the created regsecret does not have permission to pull images from the specified container registry or the secret was never created.

Make sure the regsecret docker-registry secret is created and contains appropriate privileges to fetch images.