Friday, 16 May 2014

Messing around with Elasticsearch


Elasticsearch(ES) is an extremely powerful tool allowing us to deploy a full-text search engine in no time. But unfortunately, this power doesn't come with some decent documentation. Having spend an insane amount of time googling the concepts & issues of Elasticsearch, I've tried to compile all the stuff that helped me deal with ES.

Tools


Before diving into Elasticsearch I'd suggest you use the following tools:
  • Postman: Postman allows you to easily work with APIs. Its a prudent tool for working with ES. If allows you to send requests without even visiting your terminal, eliminating the need to use cURL as most tutorials would have you do.

  • JSON Lint: Its a website that allows you to test the code quality of your Javascript code or JSON objects in our case. Since the queries we write in ES are essentially JSON objects, its prudent to validate the syntax of the JSON object.

  • Play: This is one of the best tools for working with ES. While this JSFiddle inspired utility doesn't allow you to make API calls, it provides a sandbox for messing around with ES queries. You can feed your data & mappings along with the search queries and view the results, all in the same window.

Plugins


There are several plugins for Elasticsearch that can make your life a lot easier.
  • Kibana: Kibana is a powerful tool for visualizing and analyzing your data. It provides a simple yet powerful text search along with a decent UI.

  • Marvel Sense: Sense was a Chrome extension allowing one to make Elasticsearch queries, which was later merged in the Marvel plugin of Elasticsearch. A wise move, considering the ease with which you can execute Elasticsearch queries, which comes along with the goodness of Marvel, a plugin for monitoring and analyzing the health of your cluster.
      Edit: Sense is resumed as a chrome plugin. You can get it here.
  • Head:  Head is the quintessential plugin for Elasticsearch. It provides a frontend for your Elasticsearch cluster, showing the status of your entire cluster, down to each shard. You can also browse the entire index, and execute queries. And ofcourse you can create, flush, and delete indexes with it as well.


Getting started and beyond


First of all get familiar with the terminology from this blog. Now, An Elasticsearch Primer & Querying ElasticSearch are good places to start with ES and get a hold of the basic concepts. And don't forget to go through Elasticsearch's README from the github repo.

Once you move on to searching & querying more complex structures, take a look at Fun With Elasticsearch's Children and Nested Documents.

Now assuming that against all odds you are still not daunted by the complexity of Elasticsearch queries, you are likely to find yourself in some serious mess in the future, probably pulling your hair with frustration (like I did). In order to avoid such misery altogether, refer to Troubleshooting Elasticsearch searches, for Beginners. This is an article by the 'Found Foundation', the same organization behind Play (the tool mentioned above). In fact, they have a whole Foundation series, full of great guides for beginners. I personally haven't been through many of them, but I certainly plan to.

Now a word of caution, ES can be daunting for some, but try not to do it in bits & pieces. ES might take a lot of time, but it's fruitful only if you dig deep. Anyhow, its common for most to forget some stuff about ES. So for a quick reference, take a look at this presentation. Its kind of a cheatsheet for Elasticsearch.

Assuming that all has worked well for you, there still remains the task of optimization. For that, Elasticsearch Indexing Performance Cheatsheet should come in handy.

Hope all these links help you in your battle with the insane amount of data we produce & use everyday.

May the source be with you!!