Using MADlib for Analytics
If the pod that runs a primary Greenplum segment instance fails or is deleted, the Greenplum
StatefulSet restarts the pod. However, the Greenplum master instance remains offline so you can fail over to the standby master instance. This topic describes how to configure the MADlib open-source library for scalable in-database analytics in VMware Tanzu Greenplum.
Unlike with other VMware Tanzu Greenplum distributions, VMware Tanzu Greenplum for Kubernetes automatically installs the MADlib software as part of the Greenplum Docker image. For example, after initializing a new Greenplum cluster in Kubernetes, you can see that MADlib is available as an installed Debian Package:
$ kubectl exec -it master-0 -- bash -c "dpkg -s madlib"
Package: madlib Status: install ok installed Priority: optional Section: devel Installed-Size: 59035 Maintainer: firstname.lastname@example.org Architecture: amd64 Version: 1.17.0 Description: Apache MADlib is an Open-Source Library for Scalable in-Database Analytics
To begin using MADlib, you simply use the
madpack utility to add MADlib functions to your database, as described in the next section.
To install the MADlib functions to a database, use the
madpack utility. For example:
$ kubectl exec -it master-0 -- bash -c "source ./.bashrc; madpack -p greenplum install"
madpack.py: INFO : Detected Greenplum DB version 6.8.0. madpack.py: INFO : *** Installing MADlib *** madpack.py: INFO : MADlib tools version = 1.17.0 (/usr/local/madlib/Versions/1.17.0/bin/../madpack/madpack.py) madpack.py: INFO : MADlib database version = None (host=localhost:5432, db=gpadmin, schema=madlib) madpack.py: INFO : Testing PL/Python environment... madpack.py: INFO : > Creating language PL/Python... madpack.py: INFO : > PL/Python environment OK (version: 2.7.12) madpack.py: INFO : > Preparing objects for the following modules: madpack.py: INFO : > - array_ops madpack.py: INFO : > - bayes madpack.py: INFO : > - crf ... madpack.py: INFO : Installing MADlib: madpack.py: INFO : > Created madlib schema madpack.py: INFO : > Created madlib.MigrationHistory table madpack.py: INFO : > Wrote version info in MigrationHistory table madpack.py: INFO : MADlib 1.17.0 installed successfully in madlib schema.
This installs MADlib functions into the default schema named
madpack -h or see the Greenplum MADlib Extension for Analytics documentation for VMware Tanzu Greenplum Database for more information about using
For more information about using MADlib, see: