Search is a very normal use case while building communication applications. As customers send and receive messages using our products, they’re generating lots of information that they also want to access using text queries. Let’s take the example of Helpwise, which needs to find the correct email from millions of emails. This post discusses how we find the correct message for a search query in Helpwise & how we make our search engine able to handle gazillions of messages.
#Search Query
Let’s look at how this search works for your use cases. Suppose we have 100 emails to search; then, in general, we need to check all of those documents 1 by 1 and see which document matches our query. In SQL, this can be written as:
SELECT * FROM emails WHERE body LIKE '%<query here>%'
If you look carefully, this query has a complexity of O(n) as the underlying Database that goes through each and every record to match the search query. This means if we have millions of emails, then the time taken would increase, which is pretty bad as the number of messages is virtually infinite.
This can be further optimized by restricting the database to search into the records of the particular customer by having a secondary index on something like the account id. Below is the syntax for the same:
SELECT * FROM emails WHERE body LIKE '%<query here>%' AND account_id=<account id>
Still, we don’t know how many emails can be in a single account, and it doesn’t allow us to offer performant searches.
#ElasticSearch to the rescue
Elastic Search is the rescue to the basic search query & it helps to accelerate the search results. According to elastic,
Elasticsearch is a distributed, free and open search and analytics engine for all types of data, including textual, numerical, geospatial, structured, and unstructured.
#How it works?
It stores data in a data structure called an inverted index that allows faster text retrieval.
#Inverted Index
Let’s say there are 3 messages with the following text:
DocumentId | Text |
---|---|
1 | quick brown |
2 | kangaroo jumps |
3 | quick brown fox jumps |
Here, Inverted Index breaks the text and uses it as the Index key.
Text | DocumentIds |
---|---|
quick | 1, 3 |
brown | 1, 3 |
fox | 1, 3 |
kangaroo | 2 |
jumps | 2, 3 |
Now, if we need to find messages with the keyword “jumps”, an elastic search plays the role of getting the results in a constant time.
#Distributed?
The above inverted index example was a very simple one. Now, imagine the same with millions of email/text messages. It can create a virtually infinitely large inverted index. Saving this in a single storage device limits the ability to scale it in terms of computation and storage. So, Elastic Search breaks data into almost equal-sized shards and distributes it among multiple nodes, making horizontal scaling possible.
More or less, Elastic Search abstracts the complexity of saving and searching the inverted indices metadata.
#Reasons to choose ElasticSearch
- Simple Restful APIs
- Uses JSON to represents documents.
- Scalable
- Good Industry support