Deploying GPText with Greenplum (Beta)
This section describes procedures for deploying a VMware Tanzu Greenplum cluster with GPText on Kubernetes.
Note: GPText is a Beta feature with VMware Tanzu Greenplum for Kubernetes.
About GPText with VMware Tanzu Greenplum for Kubernetes
When you deploy GPText with VMware Tanzu Greenplum for Kubernetes, the Greenplum Operator creates resources to run the Apache Solr Cloud and ZooKeeper instances necessary for using GPText. ZooKeeper can be deployed to multiple replica pods as needed for redundancy. Currently, Apache Solr Cloud can only be deployed to a single pod.
Note that Zookeeper instances are not deployed on the Greenplum segment hosts (a ‘binding’ ZooKeeper cluster), as described in the VMware Tanzu Greenplum Text Documentation.
Deploying GPtext with VMware Tanzu Greenplum for Kubernetes
Follow these steps to deploy GPText with a new VMware Tanzu Greenplum cluster on Kubernetes.
Use the procedure described in Deploying or Redeploying a Greenplum Cluster to deploy the cluster, but use the
samples/my-gp-with-gptext-instance.yaml
as the basis for your deployment. Copy the file into your/workspace
directory. For example:$ cd ./greenplum-for-kubernetes-*/workspace $ cp ./samples/my-gp-with-gptext-instance.yaml .
Edit the file as necessary for your deployment.
samples/my-gp-with-gptext-instance.yaml
includes additional properties to configure Greenplum Text in the new cluster:apiVersion: "greenplum.pivotal.io/v1" kind: "GreenplumCluster" metadata: name: my-greenplum spec: masterAndStandby: hostBasedAuthentication: | # host all gpadmin 0.0.0.0/0 trust memory: "800Mi" cpu: "0.5" storageClassName: standard storage: 1G workerSelector: {} segments: primarySegmentCount: 1 memory: "800Mi" cpu: "0.5" storageClassName: standard storage: 1G workerSelector: {} gptext: serviceName: "my-greenplum-gptext" --- apiVersion: "greenplum.pivotal.io/v1beta1" kind: "GreenplumTextService" metadata: name: my-greenplum-gptext spec: solr: replicas: 1 cpu: "0.5" memory: "1Gi" workerSelector: {} storageClassName: standard storage: 100M zookeeper: replicas: 3 cpu: "0.5" memory: "1Gi" workerSelector: {} storageClassName: standard storage: 100M
The entry:
gptext: serviceName: "my-greenplum-gptext"
Indicates that the cluster will use the GPText service configuration named
my-greenplum-gptext
, that follows at the end of the yaml file. The sample configuration creates a single Solr pod (required) and three ZooKeeper replica pods (the minimum required for Apache Solr Cloud). Minimal settings for CPU and memory are defined for each pod. You can customize these values as needed, as well as theworkerSelector
value if you want to constrain the replica pods to labeled nodes in your cluster. You can also customize thestorageClassName
if necessary to provide specialized storage for storing GPText indexes differently than Greenplum Database.Use
kubectl apply
command with your modified Greenplum manifest file to send the deployment request to the Greenplum Operator. For example:$ kubectl apply -f ./my-gp-with-gptext-instance.yaml
greenplumcluster.greenplum.pivotal.io/my-greenplum created greenplumtextservice.greenplum.pivotal.io/my-greenplum-gptext created
If you are deploying another instance of a Greenplum cluster, specify the Kubernetes namespace where you want to deploy the new cluster. For example, if you previously deployed a cluster in the namespace gpinstance-1, you could deploy a second Greenplum cluster in the gpinstance-2 namespace using the command:
$ kubectl apply -f ./my-gp-with-gptext-instance.yaml -n gpinstance-2
The Greenplum Operator deploys the necessary Greenplum and GPText resources according to your specification, and also initializes the Greenplum cluster.
Execute the following command to monitor the deployment of the cluster. While the cluster is initializing the status will be
Pending
:$ watch kubectl get all
NAME READY STATUS RESTARTS AGE pod/greenplum-operator-6ff95b6b79-nw77p 1/1 Running 0 5m32s pod/master-0 1/1 Running 0 2m26s pod/my-greenplum-gptext-solr-0 1/1 Running 0 2m33s pod/my-greenplum-gptext-zookeeper-0 1/1 Running 0 2m33s pod/my-greenplum-gptext-zookeeper-1 1/1 Running 0 2m30s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/agent ClusterIP None <none> 22/TCP 2m26s service/greenplum LoadBalancer 10.109.56.155 <pending> 5432:31387/TCP 2m26s service/greenplum-validating-webhook-service-6ff95b6b79-nw77p ClusterIP 10.109.191.15 <none> 443/TCP 5m30s service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 30m service/my-greenplum-gptext-solr ClusterIP None <none> 8983/TCP 2m33s service/my-greenplum-gptext-zookeeper ClusterIP None <none> 2888/TCP,3888/TCP,2181/TCP 2m33s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/greenplum-operator 1/1 1 1 5m32s NAME DESIRED CURRENT READY AGE replicaset.apps/greenplum-operator-6ff95b6b79 1 1 1 5m32s NAME READY AGE statefulset.apps/master 1/1 2m26s statefulset.apps/my-greenplum-gptext-solr 1/1 2m33s statefulset.apps/my-greenplum-gptext-zookeeper 3/3 2m33s statefulset.apps/segment-a 1/1 2m26s NAME STATUS AGE greenplumcluster.greenplum.pivotal.io/my-greenplum Running 2m33s NAME AGE greenplumtextservice.greenplum.pivotal.io/my-greenplum-gptext 2m33s
Note that the Solr and ZooKeeper services are created along with the Greenplum Database cluster.
Describe your Greenplum cluster to verify that it was created successfully. The Phase should eventually transition to
Running
:$ kubectl describe greenplumClusters/my-greenplum
Name: my-greenplum Namespace: default Labels: <none> Annotations: API Version: greenplum.pivotal.io/v1 Kind: GreenplumCluster Metadata: Creation Timestamp: 2020-05-13T22:12:50Z Finalizers: stopcluster.greenplumcluster.pivotal.io Generation: 3 Resource Version: 2814 Self Link: /apis/greenplum.pivotal.io/v1/namespaces/default/greenplumclusters/my-greenplum UID: 697b412b-719d-446f-bc5a-af51c5d0ae00 Spec: Gptext: Service Name: my-greenplum-gptext Master And Standby: Cpu: 0.5 Host Based Authentication: # host all gpadmin 0.0.0.0/0 trust Memory: 800Mi Storage: 1G Storage Class Name: standard Worker Selector: Segments: Cpu: 0.5 Memory: 800Mi Primary Segment Count: 1 Storage: 1G Storage Class Name: standard Worker Selector: Status: Instance Image: greenplum-for-kubernetes:v2.0.0 Operator Version: greenplum-operator:v2.0.0 Phase: Running Events: <none>
If you are deploying a brand new cluster, the Greenplum Operator automatically initializes the Greenplum cluster. The
Phase
should eventually transition fromPending
toRunning
and the Events should match the output above.
Note: If you redeployed a previously-deployed Greenplum cluster, the phase will begin atPending
. The cluster uses its existing Persistent Volume Claims if they are available. In this case, the master and segment data directories will already exist in their former state. The master-0 pod automatically starts the Greenplum Cluster, after which the phase transitions toRunning
.At this point, you can work with the deployed Greenplum cluster by executing Greenplum utilities from within Kubernetes, or by using a locally-installed tool, such as
psql
, to access the Greenplum instance running in Kubernetes. To validate the initial GPText service deployment configuration, follow the instructions in Verifying GPText. Or, to begin working with GPText see Working With GPText Indexes in the GPText documentation.
Verifying GPText
Follow these steps to quickly verify GPText operation in your new cluster, using downloaded sample data.
Open a bash shell on the
master-0
pod:$ kubectl exec -it master-0 -- bash
Set the environment for accessing Greenplum Database and GPText tools:
$ source /usr/local/greenplum-db/greenplum_path.sh $ source /opt/gptext/greenplum-text_path.sh
Start the
psql
subsystem:$ psql -d postgres
psql (9.4.24) Type "help" for help. postgres=#
Query the version of GPText that is installed:
gpadmin=# select * from gptext.version();
version -------------------------------- Greenplum Text Analytics 3.4.2 (1 row)
Execute the following series of commands to create an external index and add several PDF documents to the index:
gpadmin=# SELECT * FROM gptext.create_index_external('gptext-docs');
INFO: Created index gptext-docs create_index_external ----------------------- t (1 row)
gpadmin=# SELECT * FROM gptext.index_external( '{http://gptext.docs.pivotal.io/archives/GPText-docs-213.pdf, http://gptext.docs.pivotal.io/latest/topics/administering.html, http://gptext.docs.pivotal.io/latest/topics/ext-indexes.html, http://gptext.docs.pivotal.io/latest/topics/function_ref.html, http://gptext.docs.pivotal.io/latest/topics/guc_ref.html, http://gptext.docs.pivotal.io/latest/topics/ha.html, http://gptext.docs.pivotal.io/latest/topics/index.html, http://gptext.docs.pivotal.io/latest/topics/indexes.html, http://gptext.docs.pivotal.io/latest/topics/intro.html, http://gptext.docs.pivotal.io/latest/topics/managed-schema.html, http://gptext.docs.pivotal.io/latest/topics/performance.html, http://gptext.docs.pivotal.io/latest/topics/queries.html, http://gptext.docs.pivotal.io/latest/topics/type_ref.html, http://gptext.docs.pivotal.io/latest/topics/upgrading.html, http://gptext.docs.pivotal.io/latest/topics/utility_ref.html, http://gptext.docs.pivotal.io/latest/topics/installing.html}', 'gptext-docs');
dbid | num_docs ------+---------- 2 | 16 (1 row)
gpadmin=# SELECT * FROM gptext.commit_index('gptext-docs');
commit_index -------------- t (1 row)
Perform a simple search to find the text “Solr” in the title field of the example external index:
gpadmin=# SELECT * FROM gptext.search(TABLE(SELECT 1 SCATTER BY 1), 'gptext-docs', 'title:Solr', null, null);
id | score | hs | rf -----------------------------------------------------------+----------+----+---- http://gptext.docs.pivotal.io/latest/topics/type_ref.html | 2.103843 | | (1 row)
Optionally, complete additional example tasks described in Using GPText in the Greenplum GPText documentation to learn more about GPText functionality. For example, perform the tutorials in Working With GPText Indexes or Querying GPText Indexes.
Note: In VMware Tanzu Greenplum for Kubernetes, the scripts used to set the environment for Greenplum Database and GPText are/usr/local/greenplum-db/greenplum_path.sh
and/opt/gptext/greenplum-text_path.sh
, respectively. These paths differ from the paths used with GPText deployed to non-Kubernetes environments.