Chapter 3. Querying AWS S3 using Prestodb

Published in

DevOps DeepDive

3 min readApr 14, 2021

This lab is all about executing queries in prestodb with amazon s3 in the back end .We will be executing DDL(data definition language) and DML(data manipulation Language) statements . For configuring Presto Cluster , follow Chapter 2

Scenario 1:
In this , we will be creating a table which will act as a logical separation for the data , and then we will insert some records , define the format in which we want to insert the data , and finally will query the data.

Create a table :

presto-cli --server localhost:8080 --catalog hive
presto> use default
     -> ;
USEpresto:default>  CREATE TABLE presto_table ( name varchar, hage varchar ) WITH (external_location = 's3a://<s3Bucket_name>/<folder_name>', format = 'CSV' );
             -> ;
CREATE TABLEpresto:default> desc  presto_table;
 Column |  Type   | Extra | Comment 
--------+---------+-------+---------
 name   | varchar |       |         
 hage   | varchar |       |         
(2 rows)Query 20210414_022835_00009_zk47q, FINISHED, 2 nodes
Splits: 19 total, 19 done (100.00%)
0.35 [2 rows, 162B] [5 rows/s, 460B/s]

Now lets insert some data into this table using INSERT command:

presto> use default
     -> ;
USE
presto:default> insert into presto_table(name ,age) values ('Jack','25');
INSERT: 1 rowpresto:default> insert into presto_table(name ,age) values ('Deniel','25');
INSERT: 1 row

Let us now fetch these records using SELECT command:

presto:default> select count(*) from presto_table;
 _col0 
-------
     2 
(1 row)Query 20210414_023336_00011_zk47q, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0.41 [2 rows, 1.03KB] [4 rows/s, 2.52KB/s]

This is one way to insert data into s3 using INSERT command in PrestoDB,
However we can directly upload the CSV file and query the same way. Presto supports
Delimited, CSV, RCFile, JSON, SequenceFile, ORC, Avro, and Parquet file types.

Scenario 2:
In this scenario , we will be creating a table which will act like a logical separation to query data , here we will first upload a csv file to s3 which will be having some records to query on.

Upload .csv file
You can use the below command to upload your csv file , or can directly go to aws console and upload you csv file/files

aws s3 cp /path_of_csv_file/presto.csv s3://<bucket_name>/directory_name/

Create a Table:

presto-cli --server localhost:8080 --catalog hive
presto> use default
     -> ;
USEpresto:default>  CREATE TABLE presto_table_scv ( name varchar, hage varchar ) WITH (external_location = 's3a://<s3Bucket_name>/<folder_name>', format = 'CSV' );
             -> ;
CREATE TABLEpresto:default> desc  presto_table_csv;
 Column |  Type   | Extra | Comment 
--------+---------+-------+---------
 name   | varchar |       |         
 hage   | varchar |       |         
(2 rows)Query 20210414_022835_00009_zk47q, FINISHED, 2 nodes
Splits: 19 total, 19 done (100.00%)
0.35 [2 rows, 162B] [5 rows/s, 460B/s]

Lets now run the SELECT command on this table

presto:default> select count(*) from presto_table_csv;
 _col0 
-------
     2 
(1 row)Query 20210414_023336_00011_zk47q, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0.41 [2 rows, 1.03KB] [4 rows/s, 2.52KB/s]

You can have your query run on one CSV file or multiple CSV files , just upload your CSVs in the path mentioned in create table command.

Note: Your DML queries wont work if you havent configured below flag in your hive.properties file:

hive.non-managed-table-writes-enabled=true

For more flags , that can be used in presto , you can visit
https://prestodb.io/docs/current/connector/hive.html#hive-configuration-properties

Hope this was helpful!
See you in next Chapter!
Happy Learning!
Shivani S.

DevOps DeepDive

Chapter 3. Querying AWS S3 using Prestodb

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Published in DevOps DeepDive

Written by Shivani Singh

Responses (1)