Deploying GPText with Greenplum (Beta)
This section describes procedures for deploying a Greenplum for Kubernetes cluster that includes the Pivotal GPText.
Note: Pivotal GPText in Greenplum for Kubernetes is a Beta feature.
About GPText on Greenplum for Kubernetes
When you deploy GPText to Greenplum for Kubernetes, the Greenplum Operator creates resources to run the Apache Solr Cloud and ZooKeeper instances necessary for using GPText. ZooKeeper can be deployed to multiple replica pods as needed for redundancy. Currently, Apache Solr Cloud can only be deployed to a single pod.
Note that Zookeeper instances are not deployed on the Greenplum segment hosts (a ‘binding’ ZooKeeper cluster), as described in the Pivotal Greenplum Text Documentation.
Deploying GPtext with the Greenplum for Kubernetes
Follow these steps to deploy GPText with a new Greenplum for Kubernetes cluster.
Use the procedure described in Deploying a New Greenplum Cluster to deploy the cluster, but use the
samples/my-gp-with-gptext-instance.yaml
as the basis for your deployment. Copy the file into your/workspace
directory. For example:$ cd ./greenplum-for-kubernetes-*/workspace $ cp ./samples/my-gp-with-gptext-instance.yaml .
Edit the file as necessary for your deployment.
samples/my-gp-with-gptext-instance.yaml
includes additional properties to configure Greenplum Text in the new cluster:apiVersion: "greenplum.pivotal.io/v1" kind: "GreenplumCluster" metadata: name: my-greenplum spec: masterAndStandby: hostBasedAuthentication: | # host all gpadmin 1.2.3.4/32 trust # host all gpuser 0.0.0.0/0 md5 memory: "800Mi" cpu: "0.5" storageClassName: standard storage: 1G antiAffinity: "yes" workerSelector: {} segments: primarySegmentCount: 1 memory: "800Mi" cpu: "0.5" storageClassName: standard storage: 1G antiAffinity: "yes" workerSelector: {} gptext: serviceName: "my-greenplum-gptext" --- apiVersion: "greenplum.pivotal.io/v1beta1" kind: "GreenplumTextService" metadata: name: my-greenplum-gptext spec: solr: replicas: 1 cpu: "0.5" memory: "1Gi" workerSelector: {} storageClassName: standard storage: 100M zookeeper: replicas: 3 cpu: "0.5" memory: "1Gi" workerSelector: {} storageClassName: standard storage: 100M
The entry:
gptext: serviceName: "my-greenplum-gptext"
Indicates that the cluster will use the GPText service configuration named
my-greenplum-gptext
, that follows at the end of the yaml file. The sample configuration creates a single Solr pod (required) and three ZooKeeper replica pods (the minimum required for Apache Solr Cloud). Minimal settings for CPU and memory are defined for each pod. You can customize these values as needed, as well as theworkerSelector
value if you want to constrain the replica pods to labeled nodes in your cluster. You can also customize thestorageClassName
if necessary to provide specialized storage for storing GPText indexes differently than Greenplum Database.Use
kubectl apply
command with your modified PXF manifest file to send the deployment request to the Greenplum Operator. For example:$ kubectl apply -f ./my-gp-with-gptext-instance.yaml
greenplumcluster.greenplum.pivotal.io/my-greenplum created greenplumtextservice.greenplum.pivotal.io/my-greenplum-gptext created
If you are deploying another instance of a Greenplum cluster, specify the Kubernetes namespace where you want to deploy the new cluster. For example, if you previously deployed a cluster in the namespace gpinstance-1, you could deploy a second Greenplum cluster in the gpinstance-2 namespace using the command:
$ kubectl apply -f ./my-gp-with-gptext-instance.yaml -n gpinstance-2
The Greenplum Operator deploys the necessary Greenplum and GPText resources according to your specification, and also initializes the Greenplum cluster.
Execute the following command to monitor the deployment of the cluster. While the cluster is initializing the status will be
Pending
:$ watch kubectl get all
NAME READY STATUS RESTARTS AGE pod/greenplum-operator-79cddcf586-ctftb 1/1 Running 0 11m pod/master-0 1/1 Running 0 15s pod/master-1 1/1 Running 0 15s pod/my-greenplum-gptext-solr-0 0/1 Running 0 17s pod/my-greenplum-gptext-zookeeper-0 1/1 Running 0 17s pod/my-greenplum-gptext-zookeeper-1 1/1 Running 0 12s pod/my-greenplum-gptext-zookeeper-2 0/1 Pending 0 0s pod/segment-a-0 1/1 Running 0 15s pod/segment-b-0 1/1 Running 0 15s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/agent ClusterIP None <none> 22/TCP 15s service/greenplum LoadBalancer 10.100.229.5 <pending> 5432:32275/TCP 15s service/greenplum-validating-webhook-service-79cddcf586-ctftb ClusterIP 10.105.7.189 <none> 443/TCP 11m service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 28m service/my-greenplum-gptext-solr ClusterIP None <none> 8983/TCP 17s service/my-greenplum-gptext-zookeeper ClusterIP None <none> 2888/TCP,3888/TCP,2181/TCP 17s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/greenplum-operator 1/1 1 1 11m NAME DESIRED CURRENT READY AGE replicaset.apps/greenplum-operator-79cddcf586 1 1 1 11m NAME READY AGE statefulset.apps/master 2/2 15s statefulset.apps/my-greenplum-gptext-solr 0/1 17s statefulset.apps/my-greenplum-gptext-zookeeper 2/3 17s statefulset.apps/segment-a 1/1 15s statefulset.apps/segment-b 1/1 15s NAME STATUS AGE greenplumcluster.greenplum.pivotal.io/my-greenplum Pending 17s NAME AGE greenplumtextservice.greenplum.pivotal.io/my-greenplum-gptext 17s
Note that the Solr and ZooKeeper services are created along with the Greenplum Database cluster.
Describe your Greenplum cluster to verify that it was created successfully. The Phase should eventually transition to
Running
:$ kubectl describe greenplumClusters/my-greenplum
Name: my-greenplum Namespace: default Labels: <none> Annotations: kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"greenplum.pivotal.io/v1","kind":"GreenplumCluster","metadata":{"annotations":{},"name":"my-greenplum","namespace":"default"},"spec":{"gp... API Version: greenplum.pivotal.io/v1 Kind: GreenplumCluster Metadata: Creation Timestamp: 2019-10-02T23:43:05Z Finalizers: stopcluster.greenplumcluster.pivotal.io Generation: 2 Resource Version: 7399 Self Link: /apis/greenplum.pivotal.io/v1/namespaces/default/greenplumclusters/my-greenplum UID: b25e90e5-3ac2-40d6-94cb-a8b159b8134a Spec: Gptext: Service Name: my-greenplum-gptext Master And Standby: Anti Affinity: no Cpu: 0.5 Host Based Authentication: # host all gpadmin 1.2.3.4/32 trust # host all gpuser 0.0.0.0/0 md5 Memory: 800Mi Storage: 1G Storage Class Name: standard Worker Selector: Segments: Anti Affinity: no Cpu: 0.5 Memory: 800Mi Primary Segment Count: 1 Storage: 1G Storage Class Name: standard Worker Selector: Status: Instance Image: greenplum-for-kubernetes:v1.7.2.dev.51.g4530ad36 Operator Version: greenplum-operator:v1.7.2.dev.51.g4530ad36 Phase: Pending Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal CreatingGreenplumCluster 4m greenplumOperator Creating Greenplum cluster my-greenplum in default Normal CreatedGreenplumCluster 8s greenplumOperator Successfully created Greenplum cluster my-greenplum in default
If you are deploying a brand new cluster, the Greenplum Operator automatically initializes the Greenplum cluster. The
Phase
should eventually transition fromPending
toRunning
and the Events should match the output above.
Note: If you redeployed a previously-deployed Greenplum cluster, the phase will begin atPending
. The cluster uses its existing Persistent Volume Claims if they are available. In this case, the master and segment data directories will already exist in their former state. The master-0 pod automatically starts the Greenplum Cluster, after which the phase transitions toRunning
.At this point, you can work with the deployed Greenplum cluster by executing Greenplum utilities from within Kubernetes, or by using a locally-installed tool, such as
psql
, to access the Greenplum instance running in Kubernetes. To validate the initial GPText service deployment configuration, follow the instructions in Verifying GPText. Or, to begin working with GPText see Working With GPText Indexes in the Pivotal GPText documentation.
Verifying GPText
Follow these steps to quickly verify GPText operation in your new cluster, using downloaded sample data.
Open a bash shell on the
master-0
pod:$ kubectl exec -it master-0 bash
Set the environment for accessing Greenplum Database and GPText tools:
$ source /opt/gpdb/greenplum_path.sh $ source /opt/gptext/greenplum-text_path.sh
Start the
psql
subsystem:bash $ psql -d postgres
“` bash psql (8.3.23) Type "help” for help.postgres=# “`
Query the version of GPText that is installed:
gpadmin=# select * from gptext.version();
version -------------------------------- Greenplum Text Analytics 3.3.0 (1 row)
Execute the following series of commands to create an external index and add several PDF documents to the index:
gpadmin=# SELECT * FROM gptext.create_index_external('gptext-docs');
INFO: Created index gptext-docs create_index_external ----------------------- t (1 row)
gpadmin=# SELECT * FROM gptext.index_external( '{http://gptext.docs.pivotal.io/archives/GPText-docs-213.pdf, http://gptext.docs.pivotal.io/latest/topics/administering.html, http://gptext.docs.pivotal.io/latest/topics/ext-indexes.html, http://gptext.docs.pivotal.io/latest/topics/function_ref.html, http://gptext.docs.pivotal.io/latest/topics/guc_ref.html, http://gptext.docs.pivotal.io/latest/topics/ha.html, http://gptext.docs.pivotal.io/latest/topics/index.html, http://gptext.docs.pivotal.io/latest/topics/indexes.html, http://gptext.docs.pivotal.io/latest/topics/intro.html, http://gptext.docs.pivotal.io/latest/topics/managed-schema.html, http://gptext.docs.pivotal.io/latest/topics/performance.html, http://gptext.docs.pivotal.io/latest/topics/queries.html, http://gptext.docs.pivotal.io/latest/topics/type_ref.html, http://gptext.docs.pivotal.io/latest/topics/upgrading.html, http://gptext.docs.pivotal.io/latest/topics/utility_ref.html, http://gptext.docs.pivotal.io/latest/topics/installing.html}', 'gptext-docs');
dbid | num_docs ------+---------- 2 | 16 (1 row)
gpadmin=# SELECT * FROM gptext.commit_index('gptext-docs');
commit_index -------------- t (1 row)
Perform a simple search to find the text “Solr” in the title field of the example external index:
gpadmin=# SELECT * FROM gptext.search(TABLE(SELECT 1 SCATTER BY 1), 'gptext-docs', 'title:Solr', null, null);
id | score | hs | rf -----------------------------------------------------------+----------+----+---- http://gptext.docs.pivotal.io/latest/topics/type_ref.html | 2.103843 | | (1 row)
Optionally, complete additional example tasks described in Using GPText in the Greenplum GPText documentation to learn more about GPText functionality. For example, perform the tutorials in Working With GPText Indexes or Querying GPText Indexes.
Note: In Greenplum for Kubernetes, the scripts used to set the environment for Greenplum Database and GPText are/opt/gpdb/greenplum_path.sh
and/opt/gptext/greenplum-text_path.sh
, respectively. These paths differ from the paths described in the Pivotal Greenplum or GPText documentation.