Chapter 2. Deploy Presto Cluster From Scratch

Published in

DevOps DeepDive

3 min readApr 11, 2021

In this lab , we will be setting up presto cluster with one coordinator node and one worker node. You can follow same settings to sync up “n” number of worker nodes along with the coordinator.
Note: For understanding presto architecture and how a cluster works in presto , you can visit chapter 1

Steps to perform on the coordinator node :

Pr-requisites:
Download Presto tar ball from the below link :

https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.250/presto-server-0.250.tar.gz

a. The above tar will contain a single directory, presto-server-0.250, which will be called in the installation directory ,

b. Java version: OpenJDK 64-Bit Server VM (build 11.0.9.1+1-Ubuntu-0ubuntu1.18.04) ,
and Ubuntu 18.04.5 LTS

2. Create a data directory for storing the logs , create it outside the installation directory for easy up gradations of Presto.

3. Create an “etc” directory inside the installation path and create following configurations:

Node Properties: environmental configuration specific to each node

<presto_home_dir/etc/node.properties>

node.environment=my-first-presto-cluster
node.id=<hostname>
node.data-dir=<path to data directory where logs can be written>
catalog.config-dir=<recommend to define inside etc directory under presto installation>
node.server-log-file=<path/to/server_logs_dir/server.log>
node.launcher-log-file=</path/to/launcher_log/launcher.log>

JVM Config: command line options for the Java Virtual Machine

<presto_home_dir/etc/jvm.config>

-server
-Xmx16G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError
-XX:+IgnoreUnrecognizedVMOptions

Config Properties: configuration for the Presto server

<presto_home_dir/etc/config.properties>

coordinator=true
node-scheduler.include-coordinator=false
http-server.http.port=8080
query.max-memory=5GB
query.max-memory-per-node=2GB
discovery-server.enabled=true
discovery.uri=http://<I.P of the coordinoator node>:8080

Catalog Properties: configuration for Connectors (data sources)

create a file hive.properties inside <presto_home_dir/etc/catalog> and paste the below content:

connector.name=hive-hadoop2
hive.metastore.uri=thrift://localhost:9083
hive.s3.aws-access-key=<put your access key>
hive.s3.aws-secret-key=<put your secret key>
hive.non-managed-table-writes-enabled=true

Log Levels: The optional log levels file, allows setting the minimum log level . Create <presto_home/etc/log.properties> and paste the below content

com.app.presto=INFO

Configure Hive MetaStore

To serve Presto catalog information such as table schema and partition location , we will be needing hive-metastore. For the first time to launch the Hive Metastore, proceed with the following:

$ mkdir ~/hive-metastore
$ cd ~/hive-metastore
$ wget https://downloads.apache.org/hive/hive-2.3.8/apache-hive-2.3.8-bin.tar.gz
$ tar -xvzf apache-hive-2.3.8-bin.tar.gz
$ cd apache-hive-2.3.8-bin
$ export HIVE_HOME=`pwd`
$ export JAVA_HOME=<path of java installation directory>
# copy the below lines in ~/hive-metastore/apache-hive-2.3.8-    bin/conf/hive-env.shexport HIVE_AUX_JARS_PATH=${HADOOP_HOME}/share/hadoop/tools/lib/aws-java-sdk-bundle-1.11.375.jar:${HADOOP_HOME}/share/hadoop/tools/lib/hadoop-aws-3.2.1.jar
export AWS_ACCESS_KEY_ID=<access key>
export AWS_SECRET_ACCESS_KEY=<secret key>$ mkdir ~/hadoop
$ cd ~/hadoop
$ wget  https://archive.apache.org/dist/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz
$ tar -xvf hadoop-3.2.1.tar.gz
$ cd hadoop-3.2.1
$ export HADOOP_HOME=`pwd`
$ cp conf/hive-default.xml.template conf/hive-site.xml
$ mkdir -p hcatalog/var/log/
$ bin/schematool -dbType derby -initSchema
$ hcatalog/sbin/hcat_server.sh start

Start Presto Server

$ cd <presto_home_dir>/bin
$ launcher start
$ cd ..
$ ./presto --server localhost:8080 --catalog hive
presto> use default;
USE
presto:default> select * from system.runtime.nodes;
               node_id                |         http_uri          | node_version | coordinator | state  
--------------------------------------+---------------------------+--------------+-------------+--------
 ffffffff-ffff-ffff-ffff-ffffffffffff | http://<coordinotor_IP>:8080 | 348          | true        | active 
(1 row)Query 20210411_094403_00021_54idy, FINISHED, 1 node
Splits: 17 total, 17 done (100.00%)
0:00 [1 rows, 71B] [4 rows/s, 352B/s]

Steps to perform on the worker node :

Follow all the steps as we did above for the coordinator node, there will be only one change for the worker node in <presto_home_dir/etc/config.properties> file

Paste the below content in this file:

coordinator=false
http-server.http.port=8080
query.max-memory=5GB
query.max-memory-per-node=1GB
query.max-total-memory-per-node=2GB
#discovery-server.enabled=true
discovery.uri=http://<coordinator_node_IP>:8080

Now check if worker node is been shown in the cluster

$ ./presto --server localhost:8080 --catalog hive
presto> use default;
USE
 select * from system.runtime.nodes;
       node_id       |         http_uri         | node_version | coordinator | state  
---------------------+--------------------------+--------------+-------------+--------
 i-049b73cfe3ce27289 | http://<woker_IP>:8080  | 350-e.1      | false       | active 
 i-02e604adaf2f5052c | http://<coordinator_IP>:8080 | 350-e.1      | true        | active

You can add “n” number of worker nodes for faster query execution and more parallelism.
All the workers can be configured like above.

Hope this was helpful!
See you in next Chapter!
Happy Learning!
Shivani S.

DevOps DeepDive

Chapter 2. Deploy Presto Cluster From Scratch

Steps to perform on the coordinator node :

Configure Hive MetaStore

Start Presto Server

Steps to perform on the worker node :

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Published in DevOps DeepDive

Written by Shivani Singh

No responses yet