DevOps DeepDive

Learn DevOps By Doing

Follow publication

Chapter 3. Querying AWS S3 using Prestodb

This lab is all about executing queries in prestodb with amazon s3 in the back end .We will be executing DDL(data definition language) and DML(data manipulation Language) statements . For configuring Presto Cluster , follow Chapter 2

Query Execution Flow

Scenario 1:
In this , we will be creating a table which will act as a logical separation for the data , and then we will insert some records , define the format in which we want to insert the data , and finally will query the data.

Create a table :

presto-cli --server localhost:8080 --catalog hive
presto> use default
-> ;
USE
presto:default> CREATE TABLE presto_table ( name varchar, hage varchar ) WITH (external_location = 's3a://<s3Bucket_name>/<folder_name>', format = 'CSV' );
-> ;
CREATE TABLE
presto:default> desc presto_table;
Column | Type | Extra | Comment
--------+---------+-------+---------
name | varchar | |
hage | varchar | |
(2 rows)
Query 20210414_022835_00009_zk47q, FINISHED, 2 nodes
Splits: 19 total, 19 done (100.00%)
0.35 [2 rows, 162B] [5 rows/s, 460B/s]

Now lets insert some data into this table using INSERT command:

presto> use default
-> ;
USE
presto:default> insert into presto_table(name ,age) values ('Jack','25');
INSERT: 1 row
presto:default> insert into presto_table(name ,age) values ('Deniel','25');
INSERT: 1 row

Let us now fetch these records using SELECT command:

presto:default> select count(*) from presto_table;
_col0
-------
2
(1 row)
Query 20210414_023336_00011_zk47q, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0.41 [2 rows, 1.03KB] [4 rows/s, 2.52KB/s]

This is one way to insert data into s3 using INSERT command in PrestoDB,
However we can directly upload the CSV file and query the same way. Presto supports
Delimited, CSV, RCFile, JSON, SequenceFile, ORC, Avro, and Parquet file types.

Scenario 2:
In this scenario , we will be creating a table which will act like a logical separation to query data , here we will first upload a csv file to s3 which will be having some records to query on.

Upload .csv file
You can use the below command to upload your csv file , or can directly go to aws console and upload you csv file/files

aws s3 cp /path_of_csv_file/presto.csv s3://<bucket_name>/directory_name/ 

Create a Table:

presto-cli --server localhost:8080 --catalog hive
presto> use default
-> ;
USE
presto:default> CREATE TABLE presto_table_scv ( name varchar, hage varchar ) WITH (external_location = 's3a://<s3Bucket_name>/<folder_name>', format = 'CSV' );
-> ;
CREATE TABLE
presto:default> desc presto_table_csv;
Column | Type | Extra | Comment
--------+---------+-------+---------
name | varchar | |
hage | varchar | |
(2 rows)
Query 20210414_022835_00009_zk47q, FINISHED, 2 nodes
Splits: 19 total, 19 done (100.00%)
0.35 [2 rows, 162B] [5 rows/s, 460B/s]

Lets now run the SELECT command on this table

presto:default> select count(*) from presto_table_csv;
_col0
-------
2
(1 row)
Query 20210414_023336_00011_zk47q, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0.41 [2 rows, 1.03KB] [4 rows/s, 2.52KB/s]

You can have your query run on one CSV file or multiple CSV files , just upload your CSVs in the path mentioned in create table command.

Note: Your DML queries wont work if you havent configured below flag in your hive.properties file:

hive.non-managed-table-writes-enabled=true

For more flags , that can be used in presto , you can visit
https://prestodb.io/docs/current/connector/hive.html#hive-configuration-properties

Hope this was helpful!
See you in next Chapter!
Happy Learning!
Shivani S.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Shivani Singh
Shivani Singh

Written by Shivani Singh

DevOps Engineer, Passionate for new tools and Technology!

Responses (1)

Write a response