Deploying PXF with Greenplum
This section describes procedures for deploying a Greenplum for Kubernetes cluster that includes the Pivotal extension framework (PXF).
About PXF on Greenplum for Kubernetes
When you deploy PXF to Greenplum for Kubernetes, the Greenplum Operator creates one or more dedicated pods, or replicas, to host the PXF server instances. This differs from Pivotal Greenplum deployed to other platforms, where a PXF server instance is deployed to each Greenplum segment host. With Greenplum for Kubernetes, you can choose to deploy as many PXF server replicas as needed to provide redundancy should a PXF pod fail and to distribute load.
You store all PXF configuration files for a Greenplum for Kubernetes cluster externally, on an S3 data source. The Greenplum for Kubernetes manifest file then specifies the S3 bucket-path to use for downloading the PXF configuration to all configured PXF servers.
When you install a new Greenplum cluster using the template PXF manifest file, workspace/samples/my-gp-with-pxf-instance.yaml
, PXF is installed and initialized with a default (empty) PXF configuration directory. After deploying the cluster, you can customize the configuration by creating PXF server configurations for multiple data sources, and then redeploy with an updated manifest file to use the PXF configuration in your cluster.
Note: By default, Greenplum for Kubernetes configures PXF server JVMs with the -XX:+MaxRAMPercentage=75.0
setting in PXF_JVM_OPTS
. This enables PXF to use most of the memory available in its container. The setting differs from PXF on Pivotal Greenplum, where PXF servers are deployed alongside Greenplum Segments and a fixed JVM memory size (-Xmx2g -Xms1g
by default) is used to avoid competing for memory resources. See Starting, Stopping, and Restarting PXF in the Pivotal Greenplum documentation for information about other runtime configuration options.
Deploying a New Greenplum Cluster with PXF Enabled
Follow these steps to deploy a new Greenplum for Kubernetes cluster with PXF enabled. (To add the PXF service to an existing Greenplum for Kubernetes cluster, see Adding PXF to an Existing Greenplum Cluster.)
You can deploy PXF servers either in their default, initialized state, or you can use an existing PXF configuration, stored in an S3 bucket location, to use as the PXF configuration for your cluster.
See also Configuring PXF Servers for information about how to create and apply PXF server configurations to a Greenplum for Kubernetes cluster.
Use the procedure described in Deploying or Redeploying a Greenplum Cluster to deploy the cluster, but use the
samples/my-gp-with-pxf-instance.yaml
as the basis for your deployment. Copy the file into your/workspace
directory. For example:$ cd ./greenplum-for-kubernetes-*/workspace $ cp ./samples/my-gp-with-pxf-instance.yaml .
Edit the file as necessary for your deployment.
my-gp-with-pxf-instance.yaml
includes properties to configure PXF in the basic Greenplum cluster:apiVersion: "greenplum.pivotal.io/v1" kind: "GreenplumCluster" metadata: name: my-greenplum spec: masterAndStandby: hostBasedAuthentication: | # host all gpadmin 0.0.0.0/0 trust memory: "800Mi" cpu: "0.5" storageClassName: standard storage: 1G antiAffinity: "yes" workerSelector: {} segments: primarySegmentCount: 1 memory: "800Mi" cpu: "0.5" storageClassName: standard storage: 2G antiAffinity: "yes" workerSelector: {} pxf: serviceName: "my-greenplum-pxf" --- apiVersion: "greenplum.pivotal.io/v1beta1" kind: "GreenplumPXFService" metadata: name: my-greenplum-pxf spec: replicas: 2 cpu: "0.5" memory: "1Gi" workerSelector: {} # pxfConf: # s3Source: # secret: "my-greenplum-pxf-configs" # endpoint: "s3.amazonaws.com" # bucket: "YOUR_S3_BUCKET_NAME" # folder: "YOUR_S3_BUCKET_FOLDER-Optional" # # Note: If using pxfConf.s3Source, in addition to applying the above yaml be sure to create a secret using a command similar to: # kubectl create secret generic my-greenplum-pxf-configs --from-literal=‘access_key_id=XXX’ --from-literal=‘secret_access_key=XXX’
The entry:
pxf: serviceName: "my-greenplum-pxf"
Indicates that the cluster will use the PXF service configuration named
my-greenplum-pxf
that follows at the end of the yaml file. The sample configuration creates two PXF replica pods for redundancy with minimal settings for CPU and memory. You can customize these values as needed, as well as theworkerSelector
value if you want to constrain the replica pods to labeled nodes in your cluster. See Greenplum PXF Service Properties for information about each available property.If you have an existing PXF configuration that you want to apply to the Greenplum for Kubernetes cluster, follow these additional steps to edit your manifest file and provide access to the configuration:
Uncomment the
pxfConf
configuration properties at the end of the template file:pxfConf: s3Source: secret: "my-greenplum-pxf-configs" endpoint: "s3.amazonaws.com" bucket: "YOUR_S3_BUCKET_NAME" folder: "YOUR_S3_BUCKET_FOLDER-Optional"
Set the
endpoint:
,bucket:
, andfolder:
properties to specify the full S3 location that contains your PXF configuration files. All directories and files located in the specified S3 bucket-folder are copied into thePXF_CONF
directory on each PXF server in the cluster. See Configuring PXF Servers for an example configuration that uses MinIO.Create a secret that can be used to authenticate access to the S3 bucket and folder that contains the PXF configuration directory. The name of the secret must match the name specified in the manifest file (
secret: "my-greenplum-pxf-configs"
by default). For example:$ kubectl create secret generic my-greenplum-pxf-configs --from-literal='access_key_id=AKIAIOSFODNN7EXAMPLE' --from-literal='secret_access_key=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY'
secret/my-greenplum-pxf-configs created
The above command creates a secret named
my-greenplum-pxf-configs
using the S3 access and secret keys that you provide. Replace the access and secret key values with the actual values for your system. If necessary, use your S3 implementation documentation to generate a secret access key.
Use the
kubectl apply
command with your modified Greenplum manifest file to send the deployment request to the Greenplum Operator. For example:$ kubectl apply -f ./my-gp-with-pxf-instance.yaml
greenplumcluster.greenplum.pivotal.io/my-greenplum created greenplumpxfservice.greenplum.pivotal.io/my-greenplum-pxf created
If you are deploying another instance of a Greenplum cluster, specify the Kubernetes namespace where you want to deploy the new cluster. For example, if you previously deployed a cluster in the namespace gpinstance-1, you could deploy a second Greenplum cluster in the gpinstance-2 namespace using the command:
$ kubectl apply -f ./my-gp-with-pxf-instance.yaml -n gpinstance-2
The Greenplum Operator deploys the necessary Greenplum and PXF resources according to your specification, and also initializes the Greenplum cluster.
Execute the following command to monitor the deployment of the Greenplum cluster and PXF service. While the Greenplum cluster is initializing its status will be
Pending
. While the PXF service is initializing its status will bePending
orDegraded
. When there are zero ready pods in the PXF service, the status will bePending
. When the PXF service has at least one ready pod, but the desired state is not yet reached, the status will beDegraded
:$ watch kubectl get all
NAME READY STATUS RESTARTS AGE pod/greenplum-operator-745b7464b4-mzdpq 1/1 Running 0 18s pod/master-0 1/1 Running 0 10s pod/master-1 0/1 Running 0 10s pod/my-greenplum-pxf-69c9bfd857-8zqbm 0/1 Running 0 11s pod/my-greenplum-pxf-69c9bfd857-tp4w7 0/1 Running 0 11s pod/segment-a-0 0/1 Running 0 10s pod/segment-b-0 0/1 Running 0 10s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/agent ClusterIP None <none> 22/TCP 10s service/greenplum LoadBalancer 10.101.215.24 <pending> 5432:30644/TCP 10s service/greenplum-validating-webhook-service-745b7464b4-mzdpq ClusterIP 10.102.70.89 <none> 443/TCP 17s service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 29m service/my-greenplum-pxf ClusterIP 10.106.121.239 <none> 5888/TCP 11s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/greenplum-operator 1/1 1 1 18s deployment.apps/my-greenplum-pxf 0/2 2 0 11s NAME DESIRED CURRENT READY AGE replicaset.apps/greenplum-operator-745b7464b4 1 1 1 18s replicaset.apps/my-greenplum-pxf-69c9bfd857 2 2 0 11s NAME READY AGE statefulset.apps/master 1/2 10s statefulset.apps/segment-a 0/1 10s statefulset.apps/segment-b 0/1 10s NAME STATUS AGE greenplumcluster.greenplum.pivotal.io/my-greenplum Pending 12s NAME STATUS greenplumpxfservice.greenplum.pivotal.io/my-greenplum-pxf Pending
Note that the Greenplum PXF service, deployment, and replicas are created in addition to the Greenplum cluster.
Describe your Greenplum cluster to verify that it was created successfully. The Phase should eventually transition to
Running
:$ kubectl describe greenplumClusters/my-greenplum
Name: my-greenplum Namespace: default Labels: <none> Annotations: kubectl.kubernetes.io/last-applied-configuration: {"apiVersion":"greenplum.pivotal.io/v1","kind":"GreenplumCluster", "metadata":{"annotations":{},"name":"my-greenplum", "namespace":"default"... API Version: greenplum.pivotal.io/v1 Kind: GreenplumCluster Metadata: Creation Timestamp: 2019-04-01T15:19:17Z Generation: 1 Resource Version: 1469567 Self Link: /apis/greenplum.pivotal.io/v1/namespaces/default/greenplumclusters/my-greenplum UID: 83e0bdfd-5491-11e9-a268-c28bb5ff3d1c Spec: Master And Standby: Anti Affinity: yes Cpu: 0.5 Host Based Authentication: # host all gpadmin 0.0.0.0/0 trust Memory: 800Mi Storage: 1G Storage Class Name: standard Worker Selector: Segments: Anti Affinity: yes Cpu: 0.5 Memory: 800Mi Primary Segment Count: 1 Storage: 2G Storage Class Name: standard Worker Selector: Status: Instance Image: greenplum-for-kubernetes:latest Operator Version: greenplum-operator:latest Phase: Running Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal CreatingGreenplumCluster 2m greenplumOperator Creating Greenplum cluster my-greenplum in default Normal CreatedGreenplumCluster 8s greenplumOperator Successfully created Greenplum cluster my-greenplum in default
If you are deploying a brand new cluster, the Greenplum Operator automatically initializes the Greenplum cluster. The
Phase
should eventually transition fromPending
toRunning
and the Events should match the output above.
Note: If you redeployed a previously-deployed Greenplum cluster, the phase will begin atPending
. The cluster uses its existing Persistent Volume Claims if they are available. In this case, the master and segment data directories will already exist in their former state. The master-0 pod automatically starts the Greenplum Cluster, after which the phase transitions toRunning
.Describe your PXF service to verify that it was created successfully. The Phase should eventually transition to
Running
:kubectl describe greenplumpxfservice.greenplum.pivotal.io/my-greenplum-pxf
Name: my-greenplum-pxf Namespace: default Labels: <none> Annotations: <none> API Version: greenplum.pivotal.io/v1beta1 Kind: GreenplumPXFService Metadata: Creation Timestamp: 2020-01-28T16:17:19Z Generation: 4 Resource Version: 2878 Self Link: /apis/greenplum.pivotal.io/v1beta1/namespaces/default/greenplumpxfservices/my-greenplum-pxf UID: aba33b0c-6c28-41f7-8d33-2c4b22bfcc63 Spec: Cpu: 0.5 Memory: 1Gi Replicas: 2 Worker Selector: Status: Phase: Running Events: <none>
The PXF service should automatically initialize itself. The
Phase
should eventually transition toRunning
.At this point, you can work with the deployed Greenplum cluster by executing Greenplum utilities from within Kubernetes, or by using a locally-installed tool, such as
psql
, to access the Greenplum instance running in Kubernetes. Examine thePXF_CONF
directory on master:$ kubectl exec -it master-0 -- bash -c "ls -R /etc/pxf"
/etc/pxf: conf keytabs lib logs servers templates /etc/pxf/conf: pxf-env.sh pxf-log4j.properties pxf-profiles.xml /etc/pxf/keytabs: /etc/pxf/lib: /etc/pxf/logs: /etc/pxf/servers: default /etc/pxf/servers/default: /etc/pxf/templates: adl-site.xml hbase-site.xml jdbc-site.xml s3-site.xml core-site.xml hdfs-site.xml mapred-site.xml wasbs-site.xml gs-site.xml hive-site.xml minio-site.xml yarn-site.xml
The above output shows a default Pivotal PXF service has just been initialized, where the
PXF_CONF
directory (/etc/pxf
) contains only the default subdirectories and template configuration files. If you applied an existing PXF configuration, verify that your custom PXF server configuration files are present. If you did not apply an existing PXF configuration, continue with the instructions in Configuring PXF Servers to verify basic PXF functionality in the new cluster.
Adding PXF to an Existing Greenplum Cluster
Follow these steps to deploy a PXF Service and associate it with an existing Greenplum for Kubernetes cluster. You can deploy PXF servers either in their default, initialized state, or you can use an existing PXF configuration, stored in an S3 bucket location, to use as the PXF configuration for your cluster.
See also Configuring PXF Servers for information about how to create and apply PXF server configurations to a Greenplum for Kubernetes cluster.
Delete the existing Greenplum resources. Note, the data in your existing Greenplum cluster is preserved:
$ kubectl delete -f workspace/my-gp-instance.yaml
Ensure that the
GreenplumCluster
and its associated resources are completely gone before continuing:$ watch kubectl get all
Edit your manifest file to add a
GreenplumPXFService
. Associate the newGreenplumPXFService
with the previously-existingGreenplumCluster
resource by setting thespec.pxf.serviceName
value. See the PXF Reference and GreenplumCluster Reference pages for more information and descriptions of the various configuration options. Below is an example manifest file:$ cat workspace/my-gp-instance.yaml
--- apiVersion: "greenplum.pivotal.io/v1" kind: "GreenplumCluster" metadata: name: my-greenplum spec: masterAndStandby: hostBasedAuthentication: | # host all gpadmin 0.0.0.0/0 trust memory: "800Mi" cpu: "0.5" storageClassName: standard storage: 1G antiAffinity: "yes" workerSelector: {} segments: primarySegmentCount: 1 memory: "800Mi" cpu: "0.5" storageClassName: standard storage: 2G antiAffinity: "yes" workerSelector: {} pxf: serviceName: "my-greenplum-pxf" --- apiVersion: "greenplum.pivotal.io/v1beta1" kind: "GreenplumPXFService" metadata: name: my-greenplum-pxf spec: replicas: 2 cpu: "0.5" memory: "1Gi" workerSelector: {}
Apply your updated manifest file to create the
GreenplumCluster
andGreenplumPXFService
resources:$ kubectl apply -f workspace/my-gp-instance.yaml
Describe your Greenplum cluster to verify that it was created successfully.
$ kubectl describe greenplumClusters/my-greenplum
Name: my-greenplum Namespace: default Labels: <none> Annotations: kubectl.kubernetes.io/last-applied-configuration: {"apiVersion":"greenplum.pivotal.io/v1","kind":"GreenplumCluster", "metadata":{"annotations":{},"name":"my-greenplum", "namespace":"default"... API Version: greenplum.pivotal.io/v1 Kind: GreenplumCluster Metadata: Creation Timestamp: 2019-04-01T15:19:17Z Generation: 1 Resource Version: 1469567 Self Link: /apis/greenplum.pivotal.io/v1/namespaces/default/greenplumclusters/my-greenplum UID: 83e0bdfd-5491-11e9-a268-c28bb5ff3d1c Spec: Master And Standby: Anti Affinity: yes Cpu: 0.5 Host Based Authentication: # host all gpadmin 0.0.0.0/0 trust Memory: 800Mi Storage: 1G Storage Class Name: standard Worker Selector: Segments: Anti Affinity: yes Cpu: 0.5 Memory: 800Mi Primary Segment Count: 1 Storage: 2G Storage Class Name: standard Worker Selector: Status: Instance Image: greenplum-for-kubernetes:latest Operator Version: greenplum-operator:latest Phase: Running Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal CreatingGreenplumCluster 2m greenplumOperator Creating Greenplum cluster my-greenplum in default Normal CreatedGreenplumCluster 8s greenplumOperator Successfully created Greenplum cluster my-greenplum in default
Note: If your Greenplum cluster is configured to use a standby master, thePhase
will stay atPending
until after you perform the next step. Then, thePhase
transitions toRunning
.If your cluster is configured to use a standby master, connect to the
master-0
pod and execute thegpstart
command manually. For example:$ kubectl exec -it master-0 -- bash -c "source /opt/gpdb/greenplum_path.sh; gpstart"
20200212:19:45:55:000517 gpstart:master-0:gpadmin-[INFO]:-Starting gpstart with args: 20200212:19:45:55:000517 gpstart:master-0:gpadmin-[INFO]:-Gathering information and validating the environment... 20200212:19:45:55:000517 gpstart:master-0:gpadmin-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 5.24.2 build dev' 20200212:19:45:55:000517 gpstart:master-0:gpadmin-[INFO]:-Greenplum Catalog Version: '301705051' 20200212:19:45:55:000517 gpstart:master-0:gpadmin-[INFO]:-Starting Master instance in admin mode 20200212:19:45:56:000517 gpstart:master-0:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information 20200212:19:45:56:000517 gpstart:master-0:gpadmin-[INFO]:-Obtaining Segment details from master... 20200212:19:45:56:000517 gpstart:master-0:gpadmin-[INFO]:-Setting new master era 20200212:19:45:56:000517 gpstart:master-0:gpadmin-[INFO]:-Master Started... 20200212:19:45:56:000517 gpstart:master-0:gpadmin-[INFO]:-Shutting down master 20200212:19:45:58:000517 gpstart:master-0:gpadmin-[INFO]:--------------------------- 20200212:19:45:58:000517 gpstart:master-0:gpadmin-[INFO]:-Master instance parameters 20200212:19:45:58:000517 gpstart:master-0:gpadmin-[INFO]:--------------------------- 20200212:19:45:58:000517 gpstart:master-0:gpadmin-[INFO]:-Database = template1 20200212:19:45:58:000517 gpstart:master-0:gpadmin-[INFO]:-Master Port = 5432 20200212:19:45:58:000517 gpstart:master-0:gpadmin-[INFO]:-Master directory = /greenplum/data-1 20200212:19:45:58:000517 gpstart:master-0:gpadmin-[INFO]:-Timeout = 600 seconds 20200212:19:45:58:000517 gpstart:master-0:gpadmin-[INFO]:-Master standby start = On 20200212:19:45:58:000517 gpstart:master-0:gpadmin-[INFO]:--------------------------------------- 20200212:19:45:58:000517 gpstart:master-0:gpadmin-[INFO]:-Segment instances that will be started 20200212:19:45:58:000517 gpstart:master-0:gpadmin-[INFO]:--------------------------------------- 20200212:19:45:58:000517 gpstart:master-0:gpadmin-[INFO]:- Host Datadir Port Role 20200212:19:45:58:000517 gpstart:master-0:gpadmin-[INFO]:- segment-a-0 /greenplum/data 40000 Primary 20200212:19:45:58:000517 gpstart:master-0:gpadmin-[INFO]:- segment-b-0 /greenplum/mirror/data 50000 Mirror Continue with Greenplum instance startup Yy|Nn (default=N):
Press
Y
to continue the startup.Describe your Greenplum PXF Service to verify that it was created successfully.
$ kubectl describe greenplumpxfservices/my-greenplum-pxf
Name: my-greenplum-pxf Namespace: default Labels: <none> Annotations: kubectl.kubernetes.io/last-applied-configuration: {"apiVersion":"greenplum.pivotal.io/v1beta1","kind":"GreenplumPXFService","metadata":{"annotations":{},"name":"my-greenplum-pxf","namespac... API Version: greenplum.pivotal.io/v1beta1 Kind: GreenplumPXFService Metadata: Creation Timestamp: 2020-03-12T21:44:49Z Generation: 4 Resource Version: 47534 Self Link: /apis/greenplum.pivotal.io/v1beta1/namespaces/default/greenplumpxfservices/my-greenplum-pxf UID: 3aafe6f1-5e6f-4bca-bb53-f97987debf9e Spec: Cpu: 0.5 Memory: 1Gi Replicas: 4 Worker Selector: Status: Phase: Running Events: <none>
Eventually, the phase changes to
Running
, indicating that the PXF service is ready for use.Enable the PXF extension for your Greenplum cluster by issuing the following commands.
$ kubectl exec -it master-0 bash $ psql -c 'CREATE EXTENSION IF NOT EXISTS pxf;'
Test the PXF service to ensure that it works:
$ kubectl exec -it master-0 bash $ psql -d postgres
postgres=# CREATE EXTERNAL TABLE pxf_read_test (a TEXT, b TEXT, c TEXT) LOCATION ('pxf://tmp/dummy1' '?FRAGMENTER=org.greenplum.pxf.api.examples.DemoFragmenter' '&ACCESSOR=org.greenplum.pxf.api.examples.DemoAccessor' '&RESOLVER=org.greenplum.pxf.api.examples.DemoTextResolver') FORMAT 'TEXT' (DELIMITER ','); SELECT * FROM pxf_read_test;
Configuring PXF Servers
With Greenplum for Kubernetes, all PXF configuration files for a cluster are stored externally, on an S3 data source. The Greenplum for Kubernetes manifest file then specifies the S3 bucket-path to use for downloading the PXF configuration to all configured PXF servers. Any directories and files at the specified bucket-path are copied as-is to all PXF Servers configured for the cluster.
This procedure describes how to add or modify PXF configuration to a Greenplum for Kubernetes cluster.
Prerequisites
This procedure uses MinIO as an example data source both for storing the PXF server configuration and for accessing remote data via PXF. If you want to follow along using the MinIO example, install the MinIO client, mc
to your local system. See the MinIO Client Quickstart Guide for installation instructions.
You should also have access to a Greenplum for Kubernetes deployment that includes PXF. See Deploying a Cluster with PXF Enabled.
Procedure
To use MinIO as a sample data source, first install a standalone MinIO server to your cluster using
helm
. For example:$ helm install stable/minio
NAME: voting-coral LAST DEPLOYED: Wed Oct 16 07:44:38 2019 NAMESPACE: default STATUS: DEPLOYED RESOURCES: ==> v1/ConfigMap NAME DATA AGE voting-coral-minio 1 0s ==> v1/Deployment NAME READY UP-TO-DATE AVAILABLE AGE voting-coral-minio 0/1 1 0 0s ==> v1/PersistentVolumeClaim NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE voting-coral-minio Bound pvc-a7d03ab0-4fda-4328-a1ed-2f603888ed13 10Gi RWO standard 0s ==> v1/Pod(related) NAME READY STATUS RESTARTS AGE voting-coral-minio-7fd9b4c78b-ddp97 0/1 ContainerCreating 0 0s ==> v1/Secret NAME TYPE DATA AGE voting-coral-minio Opaque 2 0s ==> v1/Service NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE voting-coral-minio ClusterIP 10.97.90.127 <none> 9000/TCP 0s ==> v1/ServiceAccount NAME SECRETS AGE voting-coral-minio 1 0s NOTES: voting-coral-minio.default.svc.cluster.local To access Minio from localhost, run the below commands: 1. export POD_NAME=$(kubectl get pods --namespace default -l "release=voting-coral" -o jsonpath="{.item[0].metadata.name}") 2. kubectl port-forward $POD_NAME 9000 --namespace default You can now access Minio server on http://localhost:9000. Follow the below steps to connect to Minio server with mc client: 3. mc ls voting-coral-minio-local Alternately, you can use your browser or the Minio SDK to access the server - https://docs.minio.io/categories/17
Execute the commands shown at the end of the MinIO deployment output to make the MinIO server accessible from the local host. Using the above output as an example:
$ export POD_NAME=$(kubectl get pods --namespace default -l "release=voting-coral" -o jsonpath="{.item[0].metadata.name}") $ kubectl port-forward $POD_NAME 9000 --namespace default
Forwarding from 127.0.0.1:9000 -> 9000 Forwarding from [::1]:9000 -> 9000
Also make note of the MinIO service name used within the cluster (“voting-coral-minio” in the above example). You will use this name when defining the MinIO endpoint in the PXF configuration.
Follow these steps to create the sample data file and copy it to MinIO:
Configure the
mc
client to use the MinIO server you just deployed:$ mc config host add minio http://localhost:9000 AKIAIOSFODNN7EXAMPLE wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Added `minio` successfully.
The
accessKey
andsecretKey
in the above command are used in the defaulthelm
chart deployment.Make two new buckets named to store the sample data and PXF configuration:
$ mc mb minio/pxf-config $ mc mb minio/pxf-data
Bucket created successfully `minio/pxf-config`. Bucket created successfully `minio/pxf-data`.
Create a delimited plain text data file named
pxf_s3_simple.txt
to provide the sample data:$ echo 'Prague,Jan,101,4875.33 Rome,Mar,87,1557.39 Bangalore,May,317,8936.99 Beijing,Jul,411,11600.67' > ./pxf_s3_simple.txt
Copy the sample data file to the MinIO bucket you created:
$ mc cp ./pxf_s3_simple.txt minio/pxf-data
./pxf_s3_simple.txt: 192 B / 192 B [===============] 100.00% 6.46 KiB/s 0s
Follow these steps to create the PXF server configuration file to access MinIO, and store it on the MinIO server:
Copy the template PXF MinIO configuration file from Greenplum for Kubernetes to your local host:
$ kubectl cp master-0:/etc/pxf/templates/minio-site.xml ./minio-site.xml
Open the copied template file in a text editor, and edit the file entries to access the MinIO server that you deployed with
helm
. The file contents should be similar to:<?xml version="1.0" encoding="UTF-8"?> <configuration> <property> <name>fs.s3a.endpoint</name> <value>http://voting-coral-minio:9000</value> </property> <property> <name>fs.s3a.access.key</name> <value>AKIAIOSFODNN7EXAMPLE</value> </property> <property> <name>fs.s3a.secret.key</name> <value>wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY</value> </property> <property> <name>fs.s3a.fast.upload</name> <value>true</value> </property> <property> <name>fs.s3a.path.style.access</name> <value>true</value> </property> </configuration>
Be sure to change the first property value,
fs.s3a.endpoint
, to the URL of the MinIO service that was deployed in your cluster. Theaccess.key
andsecret.key
values in the above output are used in the defaulthelm
chart deployment. All other property values are the defaults provided in the template file.Save the file and exit your text editor.
Copy your modified
minio-site.xml
file for use as the default PXF server configuration on all PXF pods deployed in your server. To do this, you will place it in the examplepxf-config
bucket under the/servers/default
directory. For example:$ mc cp ./minio-site.xml minio/pxf-config/servers/default/minio-site.xml
./minio-site.xml: 643 B / 643 B [===============] 100.00% 6.46 KiB/ s 0s
At this point, you have deployed a MinIO server with sample data, and placed a sample PXF server configuration for the minio server in a location where it will be copied and used as the default PXF server configuration.
Follow these steps to update your Greenplum cluster to use the new PXF server configuration file that you created and staged in MinIO:
- Move to the Greenplum for Kubernetes
workspace
directory you used to deploy the Greenplum cluster. - Edit the manifest file for your cluster (for example,
my-gp-with-pxf-instance.yaml
) in a text editor. Uncomment and edit the
pxfConf
configuration properties at the end of the template file to describe the MinIO location where you copied the PXF configuration file. For example:pxfConf: s3Source: secret: "my-greenplum-pxf-configs" endpoint: "voting-coral-minio:9000" protocol: "http" bucket: "pxf-config"
Create a secret that can be used to authenticate access to the S3 bucket and folder that contains the PXF configuration directory. The name of the secret must match the name specified in the manifest file (
secret: "my-greenplum-pxf-configs"
by default). For example:$ kubectl create secret generic my-greenplum-pxf-configs --from-literal='access_key_id=AKIAIOSFODNN7EXAMPLE' --from-literal='secret_access_key=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY'
secret/my-greenplum-pxf-configs created
Apply the modified configuration:
$ kubectl apply -f ./my-gp-with-pxf-instance.yaml
greenplumcluster.greenplum.pivotal.io/my-greenplum unchanged greenplumpxfservice.greenplum.pivotal.io/my-greenplum-pxf configured
If you encounter any errors while setting up
pxfConf
, refer to Troubleshooting GreenplumPXFService pod startup.
- Move to the Greenplum for Kubernetes
Perform the remaining steps on the Greenplum master pod to create and query an external table that references the sample MinIO data:
Open a bash shell on the
master-0
pod:$ kubectl exec -it master-0 bash
Start the
psql
subsystem:$ psql -d postgres
psql (8.3.23) Type "help" for help. postgres=#
Create the PXF extension in the database:
postgres=# create extension pxf;
CREATE EXTENSION
Use the PXF
s3:text
profile to create a Greenplum Database external table that references thepxf_s3_simple.txt
file that you just created and added to MinIO. This command omits the typical&SERVER=<server_name>
option in the PXF location URL, because the procedure created only the default server configuration:postgres=# CREATE EXTERNAL TABLE pxf_s3_textsimple(location text, month text, num_orders int, total_sales float8) LOCATION ('pxf://pxf/pxf_s3_simple.txt?PROFILE=s3:text') FORMAT 'TEXT' (delimiter=E',');
CREATE EXTERNAL TABLE
Query the external table to access the sample data stored on MinIO:
postgres=# SELECT * FROM pxf_s3_textsimple;
location | month | num_orders | total_sales -----------+-------+------------+------------- Prague | Jan | 101 | 4875.33 Rome | Mar | 87 | 1557.39 Bangalore | May | 317 | 8936.99 Beijing | Jul | 411 | 11600.67 (4 rows)
If you receive any errors when querying the external table, verify the contents of the
/etc/pxf/servers/default/minio-site.xml
file on each PXF server in the cluster. Also use themc
client to verify the contents and location of the sample data file on MinIO.Further PXF troubleshooting information is available in the Greenplum Database documentation at Troubleshooting PXF.