We have been using Amazon Elasticsearch for the last 6 months and it’s been an overall pleasant and predictable experience. Before opting for it we had our own doubts reading the caveats of Amazon’s cloud offering here and here. I must say, that some of the points noted in the posts are pertinent; but like all things ‘it depends’ and so we wanted to put across some things that we like about AWS Elasticsearch cloud offering and why it worked for us and probably why you should also consider.

1. Integration with AWS ecosystem

This is a big plus for us as we use Amazon Kinesis Data Firehose and it supports Amazon Elasticsearch as a destination. We were able to setup a test domain in minutes and were able to benchmark our production workload very easily. Even when it comes to authentication AWS IAM works great for us and we were able to use it as an authentication layer for our workers. You can read more about our real-time stream processing pipeline here which primarily uses Amazon Kinesis Data Firehose.

Kinesis Firehose - Elasticsearch

2. Cluster Management (scale out/up)

There were at least two separate instances in a span of the first 8 weeks where we had to revisit the cluster setup and space provisioning. In both the instances, with a few clicks, we were able to scale up and scale out with a few clicks without any downtime with close to few GBs of data which was getting shipped to Elasticsearch close to peak load.

Cluster Deploy - Elasticsearch

3. Monitoring & Alerts

Amazon Elasticsearch by default logs a plethora of important metrics to AWS Cloudwatch. This was interesting because when we started working with Elasticsearch we didn’t completely know what we were getting into. Overall, it’s a very powerful system which could fit many different use cases (timeseries aggregation in our case) and like any powerful system, it takes time to master. Thankfully, Amazon’s Elasticsearch offering makes available quite a few critical metrics and as you get deeper into the ecosystem - you can tweak your cluster to your workloads much better.

Looking at Cloudwatch and Elasticsearch metrics we were able to fine-tune: # Cluster size # Impact of number of shards & replication factor # Impact of queries

Read Latency - Elasticsearch Disk Queue - Elasticsearch

This proved to be really useful and we could track the performance metrics closely pre and post changes. Cloudwatch and it’s tight integration with Amazon SNS makes the setup of critical alerts on SMS and email really easy and something you should not miss.

4. Snapshot Recovery / Backup to S3

Although we didn’t have to use the automatic snapshots from S3 it would give anyone additional peace of mind knowing that Amazon takes daily automated snapshots in a pre-configured S3 bucket. Considering our indexes are daily, and at times we need to be able to do benchmarking on our own data - we also run daily manual snapshots through AWS Lambda for backups to S3. This allows us the ability to recover from our own snapshots if needed.

5. Elasticsearch Upgrades

Lastly, Amazon has been extremely proactive with regards to keeping pace with Elasticseach releases. At the time of this post, v6.2 is the most recent version for Elasticsearch and that’s the same version available with Amazon. Please note that version upgrades are not available from the interface and need to be handled by manually taking a snapshot and restoring it to a new domain. I hope Amazon is working on making it a seamless experience as well. Unlike quite a few cloud offerings it’s commendable to see that Amazon is making a conscious effort to offer the latest and the greatest.

Here is a glimpse of stats from our production cluster:

Device Stats
AWS Elasticsearch Version 5.2
Number of nodes 6
Number of data nodes 4
Active primary shards 104
Active shards 208
Provisioned Size 2 TB
Searchable Documents 300-500 MN

Today, we are fairly advanced with regards to our understanding of the internals of Elasticsearch - data nodes, master nodes, indexing, shards, replication strategy and query performance. In hindsight, it would be fair to say that we have been extremely happy with our choice given our limited understanding of running a production scale Elasticsearch cluster 6 months back. With Amazon Elasticsearch we were able to start small, move fast, learn on the go, and fine-tune our cluster for production workloads without having to understand every single aspect of managing and scaling a production scale Elasticsearch cluster.

While I wholeheartedly recommend Amazon’s Elasticsearch offering - it goes without saying that you should be aware of its limitations, anomalies, and caveats. Liz Bennet’s post (although not up to date) and Amazon’s Elasticsearch documentation is a good start with regards to evaluating your options. Like any cloud offering it has its limitations and trade-offs.