Search is a very normal use case while building communication applications. As customers send and receive messages using our products, they’re generating lots of information that they also want to access using text queries. Let’s take the example of Helpwise, which needs to find the correct email from millions of emails. This post discusses how we find the correct message for a search query in Helpwise & how we make our search engine able to handle gazillions of messages.

#Search Query

Let’s look at how this search works for your use cases. Suppose we have 100 emails to search; then, in general, we need to check all of those documents 1 by 1 and see which document matches our query. In SQL, this can be written as:

SELECT * FROM emails WHERE body LIKE '%<query here>%'

If you look carefully, this query has a complexity of O(n) as the underlying Database that goes through each and every record to match the search query. This means if we have millions of emails, then the time taken would increase, which is pretty bad as the number of messages is virtually infinite.

This can be further optimized by restricting the database to search into the records of the particular customer by having a secondary index on something like the account id. Below is the syntax for the same:

SELECT * FROM emails WHERE body LIKE '%<query here>%' AND account_id=<account id>

Still, we don’t know how many emails can be in a single account, and it doesn’t allow us to offer performant searches.

#ElasticSearch to the rescue

Elastic Search is the rescue to the basic search query & it helps to accelerate the search results. According to elastic,

Elasticsearch is a distributed, free and open search and analytics engine for all types of data, including textual, numerical, geospatial, structured, and unstructured.

#How it works?

It stores data in a data structure called an inverted index that allows faster text retrieval.

#Inverted Index

Let’s say there are 3 messages with the following text:

DocumentId	Text
1	quick brown
2	kangaroo jumps
3	quick brown fox jumps

Here, Inverted Index breaks the text and uses it as the Index key.

Text	DocumentIds
quick	1, 3
brown	1, 3
fox	1, 3
kangaroo	2
jumps	2, 3

Now, if we need to find messages with the keyword “jumps”, an elastic search plays the role of getting the results in a constant time.

#Distributed?

The above inverted index example was a very simple one. Now, imagine the same with millions of email/text messages. It can create a virtually infinitely large inverted index. Saving this in a single storage device limits the ability to scale it in terms of computation and storage. So, Elastic Search breaks data into almost equal-sized shards and distributes it among multiple nodes, making horizontal scaling possible.

More or less, Elastic Search abstracts the complexity of saving and searching the inverted indices metadata.

#Reasons to choose ElasticSearch

Simple Restful APIs
Uses JSON to represents documents.
Scalable
Good Industry support

Demystifying Search

#Search Query

#ElasticSearch to the rescue

#How it works?

#Inverted Index

#Distributed?

#Reasons to choose ElasticSearch

Written by

Vibhor Agrawal@vibhor1997a

Related Posts

From Zero to One in AI - How Relentless Exploration Led Us to an Unfair Advantage 💪🔥

Subscribe to our email newsletters

Demystifying Search

#Search Query

#ElasticSearch to the rescue

#How it works?

#Inverted Index

#Distributed?

#Reasons to choose ElasticSearch

Written by

Vibhor Agrawal@vibhor1997a

Related Posts

From Zero to One in AI - How Relentless Exploration Led Us to an Unfair Advantage 💪🔥

Subscribe to our email newsletters

This site uses cookies