Greenplum PXF Service Properties

This section describes each of the properties that you can define for a GreenplumPXFService configuration in the VMware Tanzu Greenplum manifest file.

Synopsis

apiVersion: "greenplum.pivotal.io/v1beta1"
kind: "GreenplumPXFService"
metadata:
  name: <string>
  namespace: <string>
spec:
  replicas: <integer>
  cpu: <cpu-limit>
  memory: <memory-limit>
  workerSelector: {
        <label>: "<value>"
        [ ... ]
  }
  pxfConf:
    s3Source:
      secret: <Secrets name string>
      endpoint: <valid URL string>
      protocol: <http|https>
      bucket: <string>
      folder: <string> [Optional]

Description

You specify Greenplum PXF configuration properties to the Greenplum Operator via the YAML-formatted Greenplum manifest file. A sample manifest file is provided in workspace/samples/my-gp-with-pxf-instance.yaml. The current version of the manifest supports configuring the cluster name, number of PXF replicas, and the memory, CPU, and remote PXF_CONF configs. See also Deploying PXF with Greenplum for information about deploying a new Greenplum cluster with PXF using a manifest file.

Note: As a best practice, keep the PXF configuration properties in the same manifest file as Greenplum Database, to simplify upgrades or changes to the related service objects.

Keywords and Values

Cluster Metadata

name: <string>
(Required.) Sets the name of the Greenplum PXF instance resources. You can filter the output of kubectl commands using this name.

This value cannot be dynamically changed for an existing cluster. If you attempt to change this value and re-apply it to an existing cluster, the Operator will create a new deployment.

namespace: <string>
(Optional.) Specifies the namespace in which the Greenplum PXF resources are deployed. If this property is not specified, the current kubectl context’s namespace is used for deployment. To set kubectl’s current context to a specific namespace, use the command:

$ kubectl config set-context $(kubectl config current-context) --namespace=<NAMESPACE>

This value cannot be dynamically changed for an existing cluster. To deploy resources to an existing cluster but under a different namespace, first delete the cluster instance and then deploy it using the new `namespace` value.

Greenplum PXF Configuration

replicas: <int>
(Optional.) The number of PXF replica pods to create in the Greenplum PXF cluster. The default is 2.

You can increase this value and re-apply it to an existing cluster as necessary.

memory: <memory-limit>
(Optional.) The amount of memory allocated to each Greenplum PXF pod. This value defines a memory limit; if a pod tries to exceed the limit it is removed and replaced by a new pod. You can specify a suffix to define the memory units (for example, 4.5Gi.). If omitted, the pod has no upper bound on the memory resource it can use or inherits the default limit if one is specified in its deployed namespace. See Assign Memory Resources to Containers and Pods in the Kubernetes documentation for more information.

If you attempt to make changes to this value and re-apply it to an existing cluster, it immediately re-creates existing pods causing service interruptions.

Note: If you do not want to specify a memory limit, comment-out or remove the memory: keyword from the YAML file.

cpu: <cpu-limit>
(Optional.) The amount of CPU resources allocated to each Greenplum PXF pod, specified as a Kubernetes CPU unit (for example, cpu: "1.2"). If omitted, the pod has no upper bound on the CPU resource it can use or inherits the default limit if one is specified in its deployed namespace. See Assign CPU Resources to Containers and Pods in the Kubernetes documentation for more information.

If you attempt to make changes to this value and re-apply it to an existing cluster, it re-creates existing pods causing service interruptions.

Note: If you do not want to specify a cpu limit, comment-out or remove the cpu: keyword from the YAML file.

workerSelector: <map of key-value pairs>
(Optional.) One or more selector labels to use for choosing Greenplum PXF pods. Specify one or more label-value pairs to constrain Greenplum PXF pods to nodes having the matching labels. Define the selector labels as you would for a pod’s nodeSelector attribute. If a workerSelector is not desired, remove the workerSelector attribute from the manifest file.

For example, consider the case where you assign the label worker=gpdb-pxf to one or more pods using the command:

$ kubectl label node <node_name> worker=gpdb-pxf

With the above labels present in your cluster, you would edit the Greenplum Operator manifest file to specify the same key-value pairs in the workerSelector attribute. This shows the relevant excerpt from the manifest file:

    ...
    workerSelector: {
      worker: "gpdb-pxf"
    }
    ...


This value cannot be dynamically changed for an existing cluster. If you update this value, it recreates the Greenplum PXF cluster for the new value to take effect.

pxfConf: <s3Source>
(Optional.) Specifies an S3 location (endpoint, bucket, and folder) and secrets file to use for downloading an existing PXF configuration for use with a new VMware Tanzu Greenplum cluster deployment. The Greenplum Operator copies the contents of the S3 location to each Greenplum segment host for use as the PXF_CONF directory (/etc/pxf). You must ensure that the bucket-folder path contains the complete directory structure and customized files for one or more PXF server configurations. See Deploying PXF with the Default Configuration for information about deploying Greenplum with a default, initialized PXF configuration directory that you can customize for accessing your data sources.

s3Source
This section contains all of the S3-related attributes required to access the PXF configuration directory.

secret: <string>
The name of a secret file the Greenplum Operator uses to access the S3 location for copying the PXF configuration. For example:

$ kubectl create secret generic my-greenplum-pxf-configs --from-literal=‘access_key_id=<accessKey>’ --from-literal=‘secret_access_key=<secretKey>’


The above command creates a secret file named my-greenplum-pxf-configs using the S3 access and secret keys that you provide. Replace <accessKey> and <secretKey> with the actual S3 access and secret key values for your system. If necessary, use your S3 implementation documentation to generate a secret access key.

endpoint: <string>
The URL of the S3 provider. For example, if you are pulling the PXF configuration from AWS S3, use the endpoint “s3.amazonaws.com”.

protocol: <http|https>
(Optional.) The protocol to use for connecting to the specified S3 endpoint when downloading files from the bucket and folder. The default is https.

bucket: <string>
The S3 bucket name that contains the PXF configuration.

folder: <string>
(Optional.) The folder name in the S3 bucket where the PXF configuration directory is stored.

Examples

See the workspace/samples/my-gp-with-pxf-instance.yaml file for an example manifest that configures the PXF resource.

See Also

Deploying PXF with Greenplum