Deploying GPText with Greenplum (Beta)

This section describes procedures for deploying a Greenplum for Kubernetes cluster that includes the Pivotal GPText.

Note: Pivotal GPText in Greenplum for Kubernetes is a Beta feature.

About GPText on Greenplum for Kubernetes

When you deploy GPText to Greenplum for Kubernetes, the Greenplum Operator creates multiple dedicated pods to host the Apache Solr Cloud and ZooKeeper instances necessary for using GPText. ZooKeeper can be deployed to multiple replica pods as needed for redundancy. Currently, Apache Solr Cloud can only be deployed to a single pod.

Note that it is not possible to place the Zookeeper instances on the Greenplum segment hosts (a ‘binding’ ZooKeeper cluster), as described in the Pivotal Greenplum Text Documentation.

Deploying GPtext with the Greenplum for Kubernetes

Follow these steps to deploy GPText with a new Greenplum for Kubernetes cluster.

  1. Use the procedure described in Deploying a New Greenplum Cluster to deploy the cluster, but use the samples/my-gp-with-gptext-instance.yaml as the basis for your deployment. Copy the file into your /workspace directory. For example:

    $ cd ./greenplum-for-kubernetes-*/workspace
    $ cp ./samples/my-gp-with-gptext-instance.yaml .
  2. Edit the file as necessary for your deployment. samples/my-gp-with-gptext-instance.yaml includes additional properties to configure Greenplum Text in the new cluster:

    apiVersion: ""
    kind: "GreenplumCluster"
      name: my-greenplum
        hostBasedAuthentication: |
          # host   all   gpadmin   trust
          # host   all   gpuser   md5
        memory: "800Mi"
        cpu: "0.5"
        storageClassName: standard
        storage: 1G
        antiAffinity: "yes"
        workerSelector: {}
        primarySegmentCount: 1
        memory: "800Mi"
        cpu: "0.5"
        storageClassName: standard
        storage: 1G
        antiAffinity: "yes"
        workerSelector: {}
        serviceName: "my-greenplum-gptext"
    apiVersion: ""
    kind: "GreenplumTextService"
      name: my-greenplum-gptext
        replicas: 1
        cpu: "0.5"
        memory: "1Gi"
        workerSelector: {}
        storageClassName: standard
        storage: 100M
        replicas: 3
        cpu: "0.5"
        memory: "1Gi"
        workerSelector: {}
        storageClassName: standard
        storage: 100M

    The entry:

        serviceName: "my-greenplum-gptext"    

    Indicates that the cluster will use the GPText service configuration named my-greenplum-gptext, that follows at the end of the yaml file. The sample configuration creates a single Solr pod (required) and three ZooKeeper replica pods (the minimum required for Apache Solr Cloud). Minimal settings for CPU and memory are defined for each pod. You can customize these values as needed, as well as the workerSelector value if you want to constrain the replica pods to labeled nodes in your cluster. You can also customize the storageClassName if necessary to provide dedicated storage for storing GPText indexes separate from Greenplum Database.

  3. Use kubectl apply command with your modified PXF manifest file to send the deployment request to the Greenplum Operator. For example:

    $ kubectl apply -f ./my-gp-with-gptext-instance.yaml  created created

    If you are deploying another instance of a Greenplum cluster, specify the Kubernetes namespace where you want to deploy the new cluster. For example, if you previously deployed a cluster in the namespace gpinstance-1, you could deploy a second Greenplum cluster in the gpinstance-2 namespace using the command:

    $ kubectl apply -f ./my-gp-with-gptext-instance.yaml -n gpinstance-2

    The Greenplum Operator deploys the necessary Greenplum and GPText resources according to your specification, and also initializes the Greenplum cluster.

  4. Execute the following command to monitor the deployment of the cluster. While the cluster is initializing the status will be Pending:

    $ watch kubectl get all
    NAME                                      READY     STATUS    RESTARTS   AGE
    pod/greenplum-operator-79cddcf586-ctftb   1/1       Running   0          11m
    pod/master-0                              1/1       Running   0          15s
    pod/master-1                              1/1       Running   0          15s
    pod/my-greenplum-gptext-solr-0            0/1       Running   0          17s
    pod/my-greenplum-gptext-zookeeper-0       1/1       Running   0          17s
    pod/my-greenplum-gptext-zookeeper-1       1/1       Running   0          12s
    pod/my-greenplum-gptext-zookeeper-2       0/1       Pending   0          0s
    pod/segment-a-0                           1/1       Running   0          15s
    pod/segment-b-0                           1/1       Running   0          15s
    NAME                                                            TYPE           CLUSTER-IP     EXTERNAL-IP   PORT(S)
    service/agent                                                   ClusterIP      None           <none>        22/TCP
    service/greenplum                                               LoadBalancer   <pending>     5432:32275/TCP
    service/greenplum-validating-webhook-service-79cddcf586-ctftb   ClusterIP   <none>        443/TCP
    service/kubernetes                                              ClusterIP      <none>        443/TCP
    service/my-greenplum-gptext-solr                                ClusterIP      None           <none>        8983/TCP
    service/my-greenplum-gptext-zookeeper                           ClusterIP      None           <none>        2888/TCP,3888/TCP,2
    181/TCP   17s
    NAME                                 READY     UP-TO-DATE   AVAILABLE   AGE
    deployment.apps/greenplum-operator   1/1       1            1           11m
    NAME                                            DESIRED   CURRENT   READY     AGE
    replicaset.apps/greenplum-operator-79cddcf586   1         1         1         11m
    NAME                                             READY     AGE
    statefulset.apps/master                          2/2       15s
    statefulset.apps/my-greenplum-gptext-solr        0/1       17s
    statefulset.apps/my-greenplum-gptext-zookeeper   2/3       17s
    statefulset.apps/segment-a                       1/1       15s
    statefulset.apps/segment-b                       1/1       15s
    NAME                                                 STATUS    AGE   Pending   17s
    NAME                                                            AGE   17s

    Note that the Solr and ZooKeeper services are created along with the Greenplum Database cluster.

  5. Describe your Greenplum cluster to verify that it was created successfully. The Phase should eventually transition to Running:

    $ kubectl describe greenplumClusters/my-greenplum
    Name:         my-greenplum
    Namespace:    default
    Labels:       <none>
    API Version:
    Kind:         GreenplumCluster
      Creation Timestamp:  2019-10-02T23:43:05Z
      Generation:        2
      Resource Version:  7399
      Self Link:         /apis/
      UID:               b25e90e5-3ac2-40d6-94cb-a8b159b8134a
        Service Name:  my-greenplum-gptext
      Master And Standby:
        Anti Affinity:              no
        Cpu:                        0.5
        Host Based Authentication:  # host   all   gpadmin   trust
    # host   all   gpuser   md5
        Memory:              800Mi
        Storage:             1G
        Storage Class Name:  standard
        Worker Selector:
        Anti Affinity:          no
        Cpu:                    0.5
        Memory:                 800Mi
        Primary Segment Count:  1
        Storage:                1G
        Storage Class Name:     standard
        Worker Selector:
      Instance Image:
      Operator Version:
      Phase:             Pending
      Type    Reason                    Age   From               Message
      ----    ------                    ----  ----               -------
      Normal  CreatingGreenplumCluster  4m    greenplumOperator  Creating Greenplum cluster my-greenplum in default
      Normal  CreatedGreenplumCluster   8s    greenplumOperator  Successfully created Greenplum cluster my-greenplum in default

    If you are deploying a brand new cluster, the Greenplum Operator automatically initializes the Greenplum cluster. The Phase should eventually transition from Pending to Running and the Events should match the output above.

    Note: If you redeployed a previously-deployed Greenplum cluster, the phase will stay at Pending. It uses the previous Persistent Volume Claims if available. In this case, the master and segment data directories will already exist in their former state. In this case, master-0 pod automatically starts Greenplum Cluster. The phase should transition to Running.

  6. At this point, you can work with the deployed Greenplum cluster by executing Greenplum utilities from within Kubernetes, or by using a locally-installed tool, such as psql, to access the Greenplum instance running in Kubernetes. To validate the initial GPText service deployment configuration, execute:

    $ kubectl exec -it master-0 bash
    $ source /opt/gptext/
    $ gptext-state configs
    20191008:23:52:39:002812 gptext-state:master-0:gpadmin-[INFO]:-Execute GPText state ...
    20191008:23:52:40:002812 gptext-state:master-0:gpadmin-[INFO]:-Check zookeeper cluster state ...
    20191008:23:52:40:002812 gptext-state:master-0:gpadmin-[WARNING]:-object of type 'NoneType' has no len()
    20191008:23:52:40:002812 gptext-state:master-0:gpadmin-[INFO]:-Cluster Configurations.

    Ensure that the ZooKeeper and Solr nodes are available for use.

  7. To begin working with GPText, see Working With GPText Indexes in the Pivotal GPText documentation.