Bash Script Deployment Configuration Options (Deprecated)

Some options that affect deployment of Greenplum are available by editing a configuration file that is used by the deploy.sh script.

The configuration defaults are supplied in file greenplum/Values-common.yml, and these settings can be overridden by entries in greenplum/Values-overrides.yml

The available configurations, and their default values include:

imageRepo: greenplum-for-kubernetes
imageTag: latest # determines the tag for the image used by Greenplum containers
dockerRegistryKeyJson: key.json # specify a relative path from greenplum/ directory to key.json
# POD specific
podMemoryLimit: 4.5Gi
podCPULimit: 1.2
useAffinity: true
# Disk specific
usePreCreatedDisks: false
clusterPrefix: dev # only used when usePreCreatedDisks is true
usePersistentDisk: true
useGCE: true # only used when usePersistentDisk is true
useVsphere: false # only used when usePersistentDisk is true
useLocal: false # only used when usePersistentDisk is true
localDir: /gpdata # only used when useLocal is true
# GPDB specific
trustedCIDR: "10.1.1.0/24" # to allow connections from that subnet to the 'svc/greenplum'

Setting IS_DEV=true in your environment before running deploy.sh deploys the Greenplum cluster with certain configuration settings that are required for Minikube:

usePersistentDisk: false
useAffinity: false
trustedCIDR: 0.0.0.0/0

The default imagePullPolicy is IfNotPresent, which uses the local Docker image that you uploaded to Minikube.

Below is a description of all available settings, grouped to provide context.

Pod Resource Settings

podMemoryLimit: 4.5Gi
podCPULimit: 1.2
useAffinity: true

Users can specify how much CPU and Memory resources are used by each pod.

podMemoryLimit specifies the memory requirement for single Greenplum instance.

podCPULimitspecifies the CPU requirement for single Greenplum instance.

Note: a pod cannot be scheduled unless these resources are available on a node.

useAffinity has to do with Greenplum high availability (HA). In production, HA is important, but in a temporary proof-of-concept deployment, a user may want to turn off HA requirements in order to run with fewer resources

For HA operation, Greenplum insists that any mirror database must run on a different host than its primary database. Translating this into Kubernetes means that a node running a given segment’s primary database cannot also host its mirror database. (And this implies that the node running the mirror cannot run on the same physical server as the primary’s node.)

One part of this requirement is achieved by the logic implied by setting useAffinity to true. When useAffinity is true, none of the pods running Greenplum instances can be scheduled on the same Kubernetes node. In other words, there will be only ONE pod running Greenplum instance on a given Kubernetes node.

Storage Volume Settings

There are different performance manageability and durability trade-offs when considering where to store Greenplum data. Below, these trade-offs are discussed using a rating scale of Best, Better, Good, Zero for four separate storage choices.

Four key criteria are repeated below for each of the four storage choices:

  • performance: largely equivalent to storage speed
  • manageability: how much labor an administrator must spend to maintain the storage
  • durability: what are the failure cases for losing data
  • high availability (HA): whether mirrors are necessary to achieve HA

Example: Ephemeral Volume (development mode, not for production)

usePersistentDisk: false

The disk life cycle is linked to the pod life cycle.

  • Best Performance: local IO, no network delay
  • Best Manageability: automatic management–storage goes away when pod is destroyed
  • Zero Durability: pod failures lose data
  • HA: mirroring is required

Example: Local Persistent Volume

usePersistentDisk: true
useLocal: true
localDir: /gpdata

The disk life cycle is linked to the Kubernetes node life cycle. The data will be lost if the node fails.

  • Best Performance: local IO, no network delay
  • Good Manageability: administrators must manually clean up localDir on a Kubernetes node. A pod with the same name on the same node can reacquire the stored data. Moving a pod across nodes could lose data.
  • Good Durability: sustains pod failures; node failures lose data
  • HA: mirroring is required

Example: Remote Persistent Volumes, Dynamically Created by the IaaS

(There are two supported platforms for IaaS choice, Google Compute Engine (GCE) and VMWare vSphere (Vsphere))

usePersistentDisk: true
useGCE: true

OR

usePersistentDisk: true
useVsphere: true

The disk life cycle is managed by the IaaS. As long as the cluster is not deleted, the data remains. The volume can be easily remounted wherever the pod goes, on any Kubernetes node.

  • Good Performance: through network bandwidth
  • Best Manageability: automatic management by IaaS
  • Best Durability: sustains container, pod, node failures; Kubernetes cluster deletion will lose data
  • HA: mirroring not required (assuming reliance on IaaS storage for HA)

Example: Pre-Created, Remote Persistent Volumes

usePreCreatedDisks: true
clusterPrefix: dev

The disk life cycle is managed by the IaaS, and even if the cluster is deleted, the data remains. The volume can be easily remounted wherever the pod goes, on any Kubernetes node.

Limitation: this feature is currently only available on GCE.

  • Good Performance: through network bandwidth
  • Best Manageability: automatic management by IaaS
  • Best Durability: sustains container, pod, node, cluster failures
  • HA: mirroring not required (assuming reliance on IaaS storage for HA)

Greenplum Settings

Below are settings for security and configuration purposes.

Example: Custom IP Range (CIDR) for Trusted Client Connections

Greenplum Database is based on PostgreSQL, which uses entries in the pg_hba.conf file to determine whether to trust a given client connection.

Part of the Greenplum for Kubernetes configuration establishes the “trusted CIDR” for access. To change this value, create a file named ./Values-overrides.yml in the gp-kubernetes/greenplum directory (if one does not already exist). Then add the line:

trustedCidr: "10.1.1.0/24"

Replace the actual IP address range to match the local deployment requirements.