Deploying PL/Container for Kubernetes

This section describes procedures for deploying a Pivotal Greenplum cluster with PL/Container for Kubernetes, or for adding PL/Container to an existing Greenplum for Kubernetes deployment..

About PL/Container for Kubernetes

The Greenplum Database PL/Container language extension enables users to create and run PL/R user-defined functions (UDFs) securely, inside a Kubernetes pod. Running UDFs in a separate pod ensures that:

  • The function execution process takes place in a separate environment and allows decoupling of the data processing.
  • SQL operators such as “scan,” “filter,” and “project” are executed at the query executor (QE) side, and advanced data analysis is executed at the container side.
  • User code cannot access the OS or the file system of the Greenplum Database pod, to minimize security risks.

Note: PL/Container for Kubernetes uses the Greenplum Database PL/Container version 3.x implementation. However, only R language support is currently available.

Deploying a New Greenplum Cluster with PL/Container

Follow these steps to deploy a new Pivotal Greenplum cluster on Kubernetes with PL/Container, or to add the PL/Container service to an existing Greenplum cluster.

  1. Use the procedure described in Deploying or Redeploying a Greenplum Cluster to deploy the cluster, but use the samples/my-gp-with-pl-instance.yaml file as the basis for your deployment. Copy the file into your /workspace directory. For example:

    $ cd ./greenplum-for-kubernetes-*/workspace
    $ cp ./samples/my-gp-with-pl-instance.yaml .
    
  2. Edit the file as necessary for your deployment. my-gp-with-pl-instance.yaml includes properties to configure PL/Container in the basic Greenplum cluster:

    apiVersion: "greenplum.pivotal.io/v1"
    kind: "GreenplumCluster"
    metadata:
      name: my-greenplum
    spec:
      masterAndStandby:
        hostBasedAuthentication: |
          # host   all   gpadmin   0.0.0.0/0   trust
        memory: "800Mi"
        cpu: "0.5"
        storageClassName: standard
        storage: 1G
        workerSelector: {}
      segments:
        primarySegmentCount: 1
        memory: "800Mi"
        cpu: "0.5"
        storageClassName: standard
        storage: 2G
        workerSelector: {}
      pl:
        serviceName: "my-greenplum-pl"
    
    ---
    apiVersion: "greenplum.pivotal.io/v1beta1"
    kind: "GreenplumPLService"
    metadata:
      name: my-greenplum-pl
    spec:
      replicas: 3
      cpu: "0.8"
      memory: "2Gi"
      workerSelector: {}
    

    The entry:

      pl:
        serviceName: "my-greenplum-pl"
    

    Indicates that the cluster will use the PL/Container service configuration named my-greenplum-pl that follows at the end of the yaml file. The sample configuration creates three PL/Container replica pods for redundancy with minimal settings for CPU and memory. You can customize these values as needed, as well as the workerSelector value if you want to constrain the replica pods to labeled nodes in your cluster. See Greenplum PL/Container Service Properties for information about each available property.

  3. Use the kubectl apply command with your modified Greenplum manifest file to send the deployment request to the Greenplum Operator. For example:

    $ kubectl apply -f ./my-gp-with-pl-instance.yaml 
    
    greenplumcluster.greenplum.pivotal.io/my-greenplum created
    greenplumplservice.greenplum.pivotal.io/my-greenplum-pl created
    

    If you are deploying another instance of a Greenplum cluster, specify the Kubernetes namespace where you want to deploy the new cluster. For example, if you previously deployed a cluster in the namespace gpinstance-1, you could deploy a second Greenplum cluster in the gpinstance-2 namespace using the command:

    $ kubectl apply -f ./my-gp-with-pl-instance.yaml -n gpinstance-2
    

    The Greenplum Operator deploys the necessary Greenplum and PL/Container resources according to your specification, and also initializes the Greenplum cluster.

  4. Execute the following command to monitor the deployment of the Greenplum cluster and PL/Container service. While the Greenplum cluster is initializing its status will be Pending. While the PL/Container service is initializing its status will be Pending or Degraded. When there are zero ready pods in the PL/Container service, the status will be Pending. When the PL/Container service has at least one ready pod, but the desired state is not yet reached, the status will be Degraded:

    $ watch kubectl get all
    
    NAME                                      READY   STATUS    RESTARTS   AGE
    pod/greenplum-operator-5774c8b877-kzvsr   1/1     Running   0          6m2s
    pod/master-0                              1/1     Running   0          29s
    pod/my-greenplum-pl-67f687fbd9-4748l      1/1     Running   0          33s
    pod/my-greenplum-pl-67f687fbd9-rprpx      1/1     Running   0          33s
    pod/my-greenplum-pl-67f687fbd9-tdbsd      1/1     Running   0          33s
    pod/segment-a-0                           1/1     Running   0          29s
    
    NAME                                                            TYPE           CLUSTER-IP           EXTERNAL-IP   PORT(S)          AGE
    service/agent                                                   ClusterIP      None                 <none>        22/TCP           29s
    service/greenplum                                               LoadBalancer   10.107.165.71        <pending>     5432:30202/TCP   29s
    service/greenplum-validating-webhook-service-5774c8b877-kzvsr   ClusterIP      10.98.169.188        <none>        443/TCP          6m
    service/kubernetes                                              ClusterIP      10.96.0.1            <none>        443/TCP          6h9m
    service/my-greenplum-pl                                         ClusterIP      10.104.162.116       <none>        6666/TCP         34s
    
    NAME                                 READY   UP-TO-DATE   AVAILABLE   AGE
    deployment.apps/greenplum-operator   1/1     1            1           6m2s
    deployment.apps/my-greenplum-pl      3/3     3            3           33s
    
    NAME                                            DESIRED   CURRENT   READY   AGE
    replicaset.apps/greenplum-operator-5774c8b877   1         1         1       6m2s
    replicaset.apps/my-greenplum-pl-67f687fbd9      3         3         3       33s
    
    NAME                         READY   AGE
    statefulset.apps/master      1/1     29s
    statefulset.apps/segment-a   1/1     29s
    
    NAME                                                 STATUS    AGE
    greenplumcluster.greenplum.pivotal.io/my-greenplum   Pending   34s
    
    NAME                                                      STATUS
    greenplumplservice.greenplum.pivotal.io/my-greenplum-pl   Running
    

    Note that the Greenplum PL/Container service, deployment, and replicas are created in addition to the Greenplum cluster.

  5. Describe your Greenplum cluster to verify that it was created successfully. The Phase should eventually transition to Running:

    $ kubectl describe greenplumClusters/my-greenplum
    
    Name:         my-greenplum
    Namespace:    default
    Labels:       <none>
    Annotations:  API Version:  greenplum.pivotal.io/v1
    Kind:         GreenplumCluster
    Metadata:
      Creation Timestamp:  2020-07-10T22:32:36Z
      Finalizers:
        stopcluster.greenplumcluster.pivotal.io
      Generation:        3
      Resource Version:  27584
      Self Link:         /apis/greenplum.pivotal.io/v1/namespaces/default/greenplumclusters/my-greenplum
      UID:               d8c51050-b9a4-48f6-8c5c-0ebe5cc6ec7c
    Spec:
      Master And Standby:
        Anti Affinity:              no
        Cpu:                        0.5
        Host Based Authentication:  # host   all   gpadmin   0.0.0.0/0   trust
    
        Memory:              800Mi
        Standby:             no
        Storage:             1G
        Storage Class Name:  standard
        Worker Selector:
      Pl:
        Service Name:  my-greenplum-pl
      Segments:
        Anti Affinity:          no
        Cpu:                    0.5
        Memory:                 800Mi
        Mirrors:                no
        Primary Segment Count:  1
        Storage:                2G
        Storage Class Name:     standard
        Worker Selector:
    Status:
      Instance Image:    greenplum-for-kubernetes:v2.0.1.dev.41.g008f2635
      Operator Version:  greenplum-operator:v2.0.1.dev.41.g008f2635
      Phase:             Running
    Events:              <none>
    

    If you are deploying a brand new cluster, the Greenplum Operator automatically initializes the Greenplum cluster. The Phase should eventually transition from Pending to Running and the Events should match the output above.


    Note: If you redeployed a previously-deployed Greenplum cluster, the phase will begin at Pending. The cluster uses its existing Persistent Volume Claims if they are available. In this case, the master and segment data directories will already exist in their former state. The master-0 pod automatically starts the Greenplum Cluster, after which the phase transitions to Running.

  6. Describe your PL/Container service to verify that it was created successfully. The Phase should eventually transition to Running:

    kubectl describe greenplumplservice.greenplum.pivotal.io/my-greenplum-pl
    
    Name:         my-greenplum-pl
    Namespace:    default
    Labels:       <none>
    Annotations:  API Version:  greenplum.pivotal.io/v1beta1
    Kind:         GreenplumPLService
    Metadata:
      Creation Timestamp:  2020-07-10T22:32:36Z
      Generation:          4
      Resource Version:    27381
      Self Link:           /apis/greenplum.pivotal.io/v1beta1/namespaces/default/greenplumplservices/   my-greenplum-pl
      UID:                 5ae3e26b-cf71-44f0-b286-06995f992069
    Spec:
      Cpu:       0.8
      Memory:    2Gi
      Replicas:  3
      Worker Selector:
    Status:
      Phase:  Running
    Events:   <none>
    

    The Pl/Container service should automatically initialize itself. The Phase should eventually transition to Running.

  7. Manually set the plcontainer.service_address configuration parameter to the address of the configured PL/Container service. The address entry uses the format <service-name>.<namespace>.svc.domain. With the sample deployment used in this procedure, the address corresponds to my-greenplum-pl.default.svc.domain. Set the parameter using the command:

    $ kubectl exec -it master-0 -- bash -c 'source /usr/local/greenplum-db/greenplum_path.sh; gpconfig -c plcontainer.service_address -v "my-greenplum-pl.namespace.svc.domain"; gpstop -u'
    
  8. Enable the PL/Container extension in each database where you want to use it. For example:

    $ kubectl exec -it master-0 -- bash -c "source /usr/local/greenplum-db/greenplum_path.sh; psql -d gpadmin -c 'create extension plcontainer;'"
    
  9. At this point, you can work with the deployed PL/Container by executing commands from within Kubernetes, or by using a locally-installed tool, such as psql, to access the Greenplum instance running in Kubernetes. See Using PL/Container in the Greenplum Database documentation for more information.