Chapter 6. Graceful shutdown for a Presto Worker

Hello Everyone!
When your presto cluster deals with heavy data processing queries and presto workers are running on spot instances using spot.io, the graceful shutdown will surely gonna help you out to avoid query failures due to abrupt spot termination.
Abrupt spot termination can unstable the presto cluster with errors like -” Internal Server Error ” or “Service Unavailable”. We can avoid this with two simple steps:
Step 1 . Install Spotinst agent which can monitor for spot termination every 10 seconds:
#!/usr/bin/env bash
curl -fsSL https://s3.amazonaws.com/spotinst-public/services/spotinst-agent-2/elastigroup-agent-init.sh | \
SPOTINST_ACCOUNT_ID=”<spotinst account id>” \
SPOTINST_TOKEN=”spotinst auth token" \
bash
Navigate to the Compute Tab and open the Advanced settings, under “user data” and paste the above script.

Step 2 . Write a shutdown script that will be invoked when getting a signal from the Spotinst agent (when a spot instance is going to be marked for termination).
You will be notified 10 mins prior. Sometimes you may receive this notification 5 mins or even 2 min prior.
#enter shutdown script here
#!/usr/bin/env bash
curl -v -X PUT -d '"SHUTTING_DOWN"' -H "Content-type: application/json" http://localhost:8080/v1/info/state --header "X-Presto-User: myuser"
Navigate to the Compute Tab and open the Advanced settings, under “Shutdown Script” and paste the below script.

Once the Shutdown script is invoked, a successful invocation is logged with a Shutdown requested
the message at INFO level in /var/log/presto/server.log
Note: If your presto cluster is secure, you need to satisfy the security which is enabled. In the above script, I am passing basic authorization.
What happens after a successful shutdown request:
- Go into
SHUTTING_DOWN
the state.
2.Wait forshutdown.grace-period
, which is 2 minutes by default.
3.After 2 mins the coordinator is aware of the shutdown request and stops scheduling tasks to the worker node.
4.All activities will be blocked until current tasks are complete.
5.Sleep for the grace period again, during this time the coordinator checks if all the tasks are complete.
6.Finally Shutdown the application.
Links to previous chapters in the Presto series:
chapter 1 chapter 2 chapter 3 chapter 4 chapter 5
Reference articles
https://trino.io/docs/current/admin/graceful-shutdown.html
https://docs.spot.io/elastigroup/features/compute/shutdown-scripts?id=shutdown-scripts
Hope this was helpful!
See you in next Chapter!
Happy Learning!
Shivani S.