DevOps DeepDive

Learn DevOps By Doing

Follow publication

Chapter 2. Deploy Presto Cluster From Scratch

--

In this lab , we will be setting up presto cluster with one coordinator node and one worker node. You can follow same settings to sync up “n” number of worker nodes along with the coordinator.
Note: For understanding presto architecture and how a cluster works in presto , you can visit chapter 1

Steps to perform on the coordinator node :

  1. Pr-requisites:
    Download Presto tar ball from the below link :
https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.250/presto-server-0.250.tar.gz

a. The above tar will contain a single directory, presto-server-0.250, which will be called in the installation directory ,

b. Java version: OpenJDK 64-Bit Server VM (build 11.0.9.1+1-Ubuntu-0ubuntu1.18.04) ,
and Ubuntu 18.04.5 LTS

2. Create a data directory for storing the logs , create it outside the installation directory for easy up gradations of Presto.

3. Create an “etc” directory inside the installation path and create following configurations:

  • Node Properties: environmental configuration specific to each node

<presto_home_dir/etc/node.properties>

node.environment=my-first-presto-cluster
node.id=<hostname>
node.data-dir=<path to data directory where logs can be written>
catalog.config-dir=<recommend to define inside etc directory under presto installation>
node.server-log-file=<path/to/server_logs_dir/server.log>
node.launcher-log-file=</path/to/launcher_log/launcher.log>
  • JVM Config: command line options for the Java Virtual Machine

<presto_home_dir/etc/jvm.config>

-server
-Xmx16G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError
-XX:+IgnoreUnrecognizedVMOptions
  • Config Properties: configuration for the Presto server

<presto_home_dir/etc/config.properties>

coordinator=true
node-scheduler.include-coordinator=false
http-server.http.port=8080
query.max-memory=5GB
query.max-memory-per-node=2GB
discovery-server.enabled=true
discovery.uri=
http://<I.P of the coordinoator node>:8080
  • Catalog Properties: configuration for Connectors (data sources)

create a file hive.properties inside <presto_home_dir/etc/catalog> and paste the below content:

connector.name=hive-hadoop2
hive.metastore.uri=thrift://localhost:9083
hive.s3.aws-access-key=<put your access key>
hive.s3.aws-secret-key=<put your secret key>
hive.non-managed-table-writes-enabled=true
  • Log Levels: The optional log levels file, allows setting the minimum log level . Create <presto_home/etc/log.properties> and paste the below content
com.app.presto=INFO

Configure Hive MetaStore

To serve Presto catalog information such as table schema and partition location , we will be needing hive-metastore. For the first time to launch the Hive Metastore, proceed with the following:

$ mkdir ~/hive-metastore
$ cd ~/hive-metastore
$ wget https://downloads.apache.org/hive/hive-2.3.8/apache-hive-2.3.8-bin.tar.gz
$ tar -xvzf apache-hive-2.3.8-bin.tar.gz
$ cd apache-hive-2.3.8-bin
$ export HIVE_HOME=`pwd`
$ export JAVA_HOME=<path of java installation directory>
# copy the below lines in ~/hive-metastore/apache-hive-2.3.8- bin/conf/hive-env.sh
export HIVE_AUX_JARS_PATH=${HADOOP_HOME}/share/hadoop/tools/lib/aws-java-sdk-bundle-1.11.375.jar:${HADOOP_HOME}/share/hadoop/tools/lib/hadoop-aws-3.2.1.jar
export AWS_ACCESS_KEY_ID=<access key>
export AWS_SECRET_ACCESS_KEY=<secret key>
$ mkdir ~/hadoop
$ cd ~/hadoop
$ wget https://archive.apache.org/dist/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz
$ tar -xvf hadoop-3.2.1.tar.gz
$ cd hadoop-3.2.1
$ export HADOOP_HOME=`pwd`
$ cp conf/hive-default.xml.template conf/hive-site.xml
$ mkdir -p hcatalog/var/log/
$ bin/schematool -dbType derby -initSchema
$ hcatalog/sbin/hcat_server.sh start

Start Presto Server

$ cd <presto_home_dir>/bin
$ launcher start
$ cd ..
$ ./presto --server localhost:8080 --catalog hive
presto> use default;
USE
presto:default> select * from system.runtime.nodes;
node_id | http_uri | node_version | coordinator | state
--------------------------------------+---------------------------+--------------+-------------+--------
ffffffff-ffff-ffff-ffff-ffffffffffff | http://<coordinotor_IP>:8080 | 348 | true | active
(1 row)
Query 20210411_094403_00021_54idy, FINISHED, 1 node
Splits: 17 total, 17 done (100.00%)
0:00 [1 rows, 71B] [4 rows/s, 352B/s]

Steps to perform on the worker node :

Follow all the steps as we did above for the coordinator node, there will be only one change for the worker node in <presto_home_dir/etc/config.properties> file

Paste the below content in this file:

coordinator=false
http-server.http.port=8080
query.max-memory=5GB
query.max-memory-per-node=1GB
query.max-total-memory-per-node=2GB
#discovery-server.enabled=true
discovery.uri=http://<coordinator_node_IP>:8080

Now check if worker node is been shown in the cluster

$ ./presto --server localhost:8080 --catalog hive
presto> use default;
USE
select * from system.runtime.nodes;
node_id | http_uri | node_version | coordinator | state
---------------------+--------------------------+--------------+-------------+--------
i-049b73cfe3ce27289 | http://<woker_IP>:8080 | 350-e.1 | false | active
i-02e604adaf2f5052c | http://<coordinator_IP>:8080 | 350-e.1 | true | active

You can add “n” number of worker nodes for faster query execution and more parallelism.
All the workers can be configured like above.

Hope this was helpful!
See you in next Chapter!
Happy Learning!
Shivani S.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Shivani Singh
Shivani Singh

Written by Shivani Singh

DevOps Engineer, Passionate for new tools and Technology!

No responses yet

Write a response