Deploying a New Greenplum Cluster
This section describes how to use the Greenplum Operator deploy a Greenplum cluster to your Kubernetes system. You can use these instructions either to deploy a brand new cluster (provisioning new, empty Persistent Volume Claims in Kubernetes), or to re-deploy an earlier cluster, re-using existing Persistent Volumes if available.
Prerequisites
This procedure requires that you first install the Greenplum for Kubernetes docker images and create the Greenplum Operator in your Kubernetes system. See Installing Greenplum for Kubernetes for more information.
Verify that the Greenplum Operator is installed and running in your system before you continue:
$ helm list
NAME REVISION UPDATED STATUS CHART APP VERSION NAMESPACE
greenplum-operator 1 Thu Oct 11 15:38:54 2018 DEPLOYED operator-0.1.0 1.0 default
To deploy multiple Greenplum cluster instances, you require multiple namespaces in your Kubernetes environment to target each cluster. If you need to create a new Kubernetes namespace, use the kubectl create namespace
command. For example:
kubectl create namespace gpinstance-1
namespace/gpinstance-1 created
Verify that you have a namespace for the new Greenplum cluster instances that you want to deploy. For example:
$ kubectl get namespaces
NAME STATUS AGE
default Active 50d
gpinstance-1 Active 50d
gpinstance-2 Active 50d
kube-public Active 50d
kube-system Active 50d
In the above output, gpinstance-1 and gpinstance-2 can be used as namespaces for deploying two different Greenplum cluster.
Procedure
Go to the
workspace
subdirectory where you unpacked the Greenplum for Kubernetes distribution:$ cd ./greenplum-for-kubernetes-*/workspace
If necessary, create a Kubernetes manifest file to specify the configuration of your Greenplum cluster. A sample file is provided in
workspace/my-gp-instance.yaml
.my-gp-instance.yaml
contains the minimal set of instructions necessary to create a demonstration cluster named “my-greenplum” with a single segment (a single primary and mirror segment) and default storage, memory, and CPU settings:apiVersion: "greenplum.pivotal.io/v1" kind: "GreenplumCluster" metadata: name: my-greenplum spec: masterAndStandby: hostBasedAuthentication: | # host all gpadmin 1.2.3.4/32 trust # host all gpuser 0.0.0.0/0 md5 memory: "800Mi" cpu: "0.5" storageClassName: standard storage: 1G antiAffinity: yes segments: primarySegmentCount: 1 memory: "800Mi" cpu: "0.5" storageClassName: standard storage: 2G antiAffinity: yes mirrors: yes
Most non-trivial clusters will require configuration changes to specify additional segments, cpu, memory,
pg_hba.conf
entries, and Storage Class resources. See Greenplum Database Properties for information about these configuration parameters and change them as necessary before you continue.Note: If you are deploying to Minikube, you must edit the file to set
antiAffinity: "no"
for the Greenplum master/standby and segments.
If you want to re-deploy a Greenplum cluster that you previously deployed, simply locate and use the existing configuration file.
If you want to deploy another Greenplum cluster (in a separate Kubernetes namespace), copy theworkspace/my-gp-instance.yaml
or a another deployment manifest file, and edit it as necessary to meet your cluster configuration requirements.Pivotal Greenplum for Kubernetes provides two additional sample manifest files that you can use as templates (copy to the
/workspace
directory before modifying):samples/my-gp-with-pxf-instance.yaml
contains the minimal configuration for a cluster that includes the Pivotal Platform Extension Framework (PXF) deployed. PXF provides connectors that enable you to access data stored in sources external to your Greenplum Database deployment. These external sources include Hadoop (HDFS, Hive, HBase), object stores (Azure, Google Cloud Storage, Minio, S3), and SQL databases (via JDBC). See Deploying PXF with Greenplum (Beta) for more information about deploying PXF with Greenplum for Kubernetes.samples/my-gp-with-gptext-instance.yaml
contains the minimal configuration for a cluster that includes the Pivotal GPText. Pivotal GPText joins the Greenplum Database massively parallel-processing database server with Apache SolrCloud enterprise search and the Apache MADlib Analytics Library to provide large-scale analytics processing and business decision support. GPText includes free text search as well as support for text analysis. See Deploying GPText with Greenplum (Beta) for more information about deploying GPText with Greenplum for Kubernetes.
(Optional) If you have specified
workerSelector
in your manifest file, then you need to apply the specified labels to the nodes that belong in themasterAndStandby
andsegments
pools by using the following command:kubectl label node <node name> <key>=<value>
Use
kubectl apply
command and specify your manifest file to send the deployment request to the Greenplum Operator. For example, to use the samplemy-gp-instance.yaml
file:$ kubectl apply -f ./my-gp-instance.yaml
greenplumcluster.greenplum.pivotal.io/my-greenplum created
If you are deploying another instance of a Greenplum cluster, specify the Kubernetes namespace where you want to deploy the new cluster. For example, if you previously deployed a cluster in the namespace gpinstance-1, you could deploy a second Greenplum cluster in the gpinstance-2 namespace using the command:
$ kubectl apply -f ./my-gp-instance.yaml -n gpinstance-2
greenplumcluster.greenplum.pivotal.io/my-greenplum created
The Greenplum Operator deploys the necessary Greenplum resources according to your specification, and also initializes the Greenplum cluster. If there are no existing Persistent Volume Claims for the cluster, new PVCs are created and used for the deployment. If PVCs for the cluster already exist, they are used as-is with the available data.
Use
kubectl get nodes --show-labels
to verify that nodes have been labeled withgreenplum-affinity-<namespace>-segment=a
,greenplum-affinity-<namespace>-segment=b
, and/orgreenplum-affinity-<namespace>-master=true
, as shown below.
Note that these labels should not be manually modified as they are used by the Operator for anti-affinity.$ kubectl get nodes --show-labels NAME STATUS ROLES AGE VERSION LABELS vm-4b50d90e-5e00-411f-5516-588711f0a618 Ready <none> 11h v1.12.4 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=custom-1-2048,beta.kubernetes.io/os=linux,bosh.id=3b3a6b47-8a1d-4a82-a06b-5349a241397e,bosh.zone=us-central1-f,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-f,greenplum-affinity-default-master=true,greenplum-affinity-default-segment=a,kubernetes.io/hostname=vm-4b50d90e-5e00-411f-5516-588711f0a618,spec.ip=10.0.11.11,worker=my-gp-masters vm-50da037c-0c00-46f8-5968-2a51cf17e426 Ready <none> 11h v1.12.4 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=custom-1-2048,beta.kubernetes.io/os=linux,bosh.id=e6440a8d-8b75-4a0e-acc9-b210e81d59dc,bosh.zone=us-central1-f,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-f,greenplum-affinity-default-master=true,greenplum-affinity-default-segment=b,kubernetes.io/hostname=vm-50da037c-0c00-46f8-5968-2a51cf17e426,spec.ip=10.0.11.16,worker=my-gp-masters vm-73e119aa-da79-4686-58df-1e9d7a9eff18 Ready <none> 11h v1.12.4 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=custom-1-2048,beta.kubernetes.io/os=linux,bosh.id=7e68ad80-6401-431b-8187-0ffc9c45dd69,bosh.zone=us-central1-f,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-f,greenplum-affinity-default-master=true,greenplum-affinity-default-segment=a,kubernetes.io/hostname=vm-73e119aa-da79-4686-58df-1e9d7a9eff18,spec.ip=10.0.11.15,worker=my-gp-segments vm-8e43e0c6-6fd5-4bff-5c3a-150cbca76781 Ready <none> 11h v1.12.4 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=custom-1-2048,beta.kubernetes.io/os=linux,bosh.id=2bfd5222-96c5-47d7-98c2-52af11ea3854,bosh.zone=us-central1-f,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-f,greenplum-affinity-default-master=true,greenplum-affinity-default-segment=b,kubernetes.io/hostname=vm-8e43e0c6-6fd5-4bff-5c3a-150cbca76781,spec.ip=10.0.11.13,worker=my-gp-segments vm-cf9fcef9-2557-43ca-43fa-01b21618e9ba Ready <none> 11h v1.12.4 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=custom-1-2048,beta.kubernetes.io/os=linux,bosh.id=5a757d0f-d312-4fee-9c3f-52bd82c225f7,bosh.zone=us-central1-f,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-f,greenplum-affinity-default-master=true,greenplum-affinity-default-segment=a,kubernetes.io/hostname=vm-cf9fcef9-2557-43ca-43fa-01b21618e9ba,spec.ip=10.0.11.14,worker=my-gp-segments vm-fb806a3c-8198-4608-671e-4659c940d2a4 Ready <none> 11h v1.12.4 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=custom-1-2048,beta.kubernetes.io/os=linux,bosh.id=18f8435d-be48-4445-b822-e0733ac7eced,bosh.zone=us-central1-f,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-f,greenplum-affinity-default-master=true,greenplum-affinity-default-segment=b,kubernetes.io/hostname=vm-fb806a3c-8198-4608-671e-4659c940d2a4,spec.ip=10.0.11.12,worker=my-gp-segments
While the cluster is initializing the status will be
Pending
:$ watch kubectl get all
NAME READY STATUS RESTARTS AGE pod/greenplum-operator-58dd68b9c5-frrbz 1/1 Running 3 3h pod/master-0 1/1 Running 0 1m pod/master-1 1/1 Running 0 1m pod/segment-a-0 1/1 Running 0 1m pod/segment-b-0 1/1 Running 0 1m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/agent ClusterIP None <none> 22/TCP 1m service/greenplum LoadBalancer 10.110.26.184 <pending> 5432:32686/TCP 1m service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 22h NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE deployment.apps/greenplum-operator 1 1 1 1 3h NAME DESIRED CURRENT READY AGE replicaset.apps/greenplum-operator-58dd68b9c5 1 1 1 3h NAME DESIRED CURRENT AGE statefulset.apps/master 2 2 1m statefulset.apps/segment-a 1 1 1m statefulset.apps/segment-b 1 1 1m NAME STATUS AGE greenplumcluster.greenplum.pivotal.io/my-greenplum Pending 1m
Describe your Greenplum cluster to verify that it was created successfully. The Phase should eventually transition to
Running
:$ kubectl describe greenplumClusters/my-greenplum
Name: my-greenplum Namespace: default Labels: <none> Annotations: kubectl.kubernetes.io/last-applied-configuration: {"apiVersion":"greenplum.pivotal.io/v1","kind":"GreenplumCluster", "metadata":{"annotations":{},"name":"my-greenplum", "namespace":"default"... API Version: greenplum.pivotal.io/v1 Kind: GreenplumCluster Metadata: Creation Timestamp: 2019-04-01T15:19:17Z Generation: 1 Resource Version: 1469567 Self Link: /apis/greenplum.pivotal.io/v1/namespaces/default/greenplumclusters/my-greenplum UID: 83e0bdfd-5491-11e9-a268-c28bb5ff3d1c Spec: Master And Standby: Anti Affinity: yes Cpu: 0.5 Host Based Authentication: # host all gpadmin 1.2.3.4/32 trust # host all gpuser 0.0.0.0/0 md5 Memory: 800Mi Storage: 1G Storage Class Name: standard Worker Selector: Segments: Anti Affinity: yes Cpu: 0.5 Memory: 800Mi Primary Segment Count: 1 Storage: 2G Storage Class Name: standard Worker Selector: Status: Instance Image: greenplum-for-kubernetes:latest Operator Version: greenplum-operator:latest Phase: Running Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal CreatingGreenplumCluster 2m greenplumOperator Creating Greenplum cluster my-greenplum in default Normal CreatedGreenplumCluster 8s greenplumOperator Successfully created Greenplum cluster my-greenplum in default
If you are deploying a brand new cluster, the Greenplum Operator automatically initializes the Greenplum cluster. The
Phase
should eventually transition fromPending
toRunning
and the Events should match the output above.
Note: If you redeployed a previously-deployed Greenplum cluster, the phase will stay atPending
. It uses the previous Persistent Volume Claims if available. In this case, the master and segment data directories will already exist in their former state. In this case, master-0 pod automatically starts Greenplum Cluster. The phase should transition toRunning
.At this point, you can work with the deployed Greenplum cluster by executing Greenplum utilities from within Kubernetes, or by using a locally-installed tool, such as
psql
, to access the Greenplum instance running in Kubernetes. For example, to run thepsql
utility on themaster-0
pod:$ kubectl exec -it master-0 bash -- -c "source /opt/gpdb/greenplum_path.sh; psql"
psql (8.3.23) Type "help" for help.
gpadmin=# select * from gp_segment_configuration;
dbid | content | role | preferred_role | mode | status | port | hostname | address | repli cation_port ------+---------+------+----------------+------+--------+-------+--------------- ---------------------------+---------------------------------------------+------ ------------ 1 | -1 | p | p | s | u | 5432 | master-0 | master-0.agent.default.svc.cluster.local | 2 | 0 | p | p | s | u | 40000 | segment-a-0 | segment-a-0.agent.default.svc.cluster.local | 6000 3 | 0 | m | m | s | u | 50000 | segment-b-0 | segment-b-0.agent.default.svc.cluster.local | 6001 4 | -1 | m | m | s | u | 5432 | master-1.agent .default.svc.cluster.local | master-1.agent.default.svc.cluster.local | (4 rows)
(Enter
\q
to exit thepsql utility
.)