Deliver powerful search with Redis

Share

Introduction

Redis is a great choice for implementing a highly available in-memory data structure store, that can be used as a database, cache, and message broker. Combine it with some of its modules, for example, RediSearch and RedisJSON and you get a full-text search engine.

In this post, we’ll dive into building our simple, yet effective and powerful search engine, starting from loading data into Redis, then performing text searches using a rich query language and finally aggregate or group query results to our specific needs.

Data that we’ll be using can vary. For the demo purposes we’ll use Bigfoot sightings data, which contains sighing data publicly available on the BFRO website in a more digestible form. We’ll use the geocoded version [1] since we want our search engine to be able to perform geospatial search as well. You can download data from here.

For the sake of clarity and concept, we’ll mainly focus on the logic rather than on the implementation. However, feel free to explore GitHub repository dedicated to this purpose, where you can easily dive deeper into POC. Once we’re familiar with the logic, all Redis commands can be transfigured and used with specific Redis client (e.g., Redis NodeJS client[2])

Setting up

There are multiple ways to set up a Redis, core, server running, along with its modules. Speaking of modules or add-ons, RediSearch and RedisJSON, enrich the Redis core data structures with search capability and the usage of modern data model like JSON.

You can manually configure Redis server and include only specific Redis modules, but for the demo purposes we’ll be using pre-built Docker image, redismod[3], that bundles couple of different Redis modules together, along with RediSearch and RedisJSON.

To get started with Redis and its modules, simply run the docker image.

Image 1 Docker Run Redis
Image 1 – running redismod docker image

Once the container is up and running, you can verify whether all bundled modules are loaded. From this point on we’ll have to communicate with Redis server, either by using a Redis CLI[4] or by using some of the available clients. The following sections rely on the usage of the Redis CLI.

Image 2 Redis Modules VerifyImage 2 – verifying state of Redis modules

Loading and retrieving data

We haven’t mentioned that Redis is a key-value[5] store, supporting only strings as keys and a couple of different data structures as values[6]. Considering that we have multiple information for each Bigfoot sighting, the best approach would be to use JSON structure as a value.

Each key can be combination of “sighting” prefix and concrete sighting identification number (e.g., sighthing:2512).

To insert specific key-value data we can use JSON.SET command to bind a JSON value to a specific key. The following example contains several JSON fields, although data could be more complex and nested.

Image 3 Json Set CommandImage 3 – setting up a JSON data

Note that we’re specifying JSON value path (symbol $[7]) which represents the root of the path upon which data will be stored to.

Once the entire data collection is loaded, we can retrieve the keys using KEYS command to retrieve each recorded data.

Image 4 Keys CommandImage 4 – getting list of Redis keys

To retrieve specific sighting data, we can use, similarly, JSON.GET command to get JSON value for a specific key.

Image 5 Json Get Command CopyImage 5 – getting JSON value for the specific key

Indexing data

Now real fun comes into play. We’re able to define multiple indexes[8] for our data, complex or not, and use them to traverse though data to quickly filter by specific criteria.

Our index could consist of multiple fields, where we could have, for example:

  • a full-text field (e.g., title) – used to perform fuzzy, exact, or stemming-like search
  • a tag field (e.g., state) – major difference from full-text field is that Redis doesn’t perform stemming on this type of field
  • a geospatial field (e.g., location) – used to quickly filter data based on location, or more specifically on longitude and latitude information

We can build such index using FT. CREATE command.

Image 6 Create IndexImage 6 – creating Redis index

Querying data

With the created index, we’re able to perform search operations. We can use FT.SEARCH command and specify the query for which we want to perform search operation.

Image 7 Ft Search CommandImage 7 – searching for a specific query

Let’s break a previous command into smaller pieces. Firstly, we’re querying sightings from Alabama state. Stemming is not applied since the field is marked as a tag, only basic lowercase, and uppercase matching is applied. Those sightings need to have a “river” and “bear” among full-text fields (we haven’t specified which ones). Finally, not all results will be returned, but a rather limited amount of the first 10 queried results, and for each result we’ll return only information about title and location.

Note that RETURN command, and most other commands following this section, require integer next to them marking the arguments count.

For more detailed introduction to query searching, you can visit this resource, where you can learn more about prefix, fuzzy matching, unions and other.

We can even address location when querying, saying that we want only those sightings near a radius of some specific location. Radius is defined either by imperial (miles – mi) or metric unites (kilometers – km).

Image 8 Ft Search Geo CommandImage 8 – performing geospatial search

The only question left is should we summarize or highlight our data? Maybe instead of long and non-informative queried title, we prefer our data visually summarized by the specific part of the data (in this case title) which contained the requested criteria. Summarization will fragment the resulted text field into smaller sized snippets and each snippet will contain the found term(s)[9] and some additional surrounding context.

Aside from summarization, we might also prefer result highlighting as well, which will highlight the found term(s) (and its stemming variants) with a user-defined tag. This is a neat option if we want to directly represent such data on a website and use surrounding bold or italic tags around each found term.

Image 9 Summarize HighlightImage 9 – summarizing and highlighting resulted data

In the search above, we’re filtering sightings that contains “river” term (or any of its variants) and we’re returning limited, summarized, and highlighted title as a result. By default, highlight is automatically applying bold HTML tags, but we’re specifying that surrounding tags should be italic.

Aggregating data

It’s a rather common request to have data aggregated in some way before the results are collected. Luckily for us, Redis offers us a rich set of aggregation tools. We can apply numeric, string, date, and geo aggregation functions[10].

Let’s assume that we want to group queried sightings by year. Currently, our each of our sightings has date field measured as Unix epoch time. Luckily, there’s an aggregation function that extracts year from timestamp.

We can perform data aggregation by calling FT.AGGREGATE command. Aggregation command allows us to APPLY specific functions. Note that JSON data fields must be first loaded to have them available within the aggregation functions.

Image 10 Ft Aggregate CommandImage 10 – simple date aggregation

Like earlier, we’re only interested in the sightings containing term “river” among full-text fields. Once we have queried date, we’re applying the date aggregation function year which extracts the year of the specified timestamp. Aggregated data is then grouped by the year, where we count and distinct grouped values by the id. Finally, a basic sort is applied.

Conclusion

In a short brief, we’ve explained the core concepts on how you can deliver powerful search with Redis and its modules (RedisJSON and RediSearch). Along the way we’ve explained some of the commands for loading or retrieving data from Redis, as well as the commands used for searching and aggregating data.

As mentioned earlier, you can always explore GitHub repository which only expands Redis learning curve to a higher degree. There, you’ll be able to explore how we can use Redis client (specifically for NodeJS) to perform search and aggregation upon Bigfoot sightings.

Useful resources

Below are listed resources which can be found amusing and insightful for further knowledge expanding.

 


[1] A geocoded version provides geographical coordinates corresponding to a specific location.

[2] Official and community driven Redis clients can be found here.

[3] Redismod contains multiple Redis modules making it a convenient way to quickly set up Redis.

[4] Redis CLI is the Redis command line interface, a simple program that allows to send commands to Redis, and read the replies sent by the server, directly from the terminal.

[5] Key-value store, or key-value database, is a simple database that uses an associative array as the fundamental data model.

[6] When it comes to values, Redis supports various data types which are listed here.

[7] Root JSON value path can be also represented with symbol “.”.

[8] Index is a data structure that improves the speed of data retrieval operations on a database at the cost of additional storage usage.

[9] Term is nothing but a word or set of words queried for the search.

[10] Aggregation function apply certain data transformation. Read more about it and about list of available functions here.

Share

Sign in to get new blogs and news first:

Leave a Reply

Aleksandar Miladinović

Mid Software Developer @Serengeti
mm

He graduated from the Faculty of Science in Kragujevac, in the department of Informatics and Mathematics. With 2+ years of experience as a Java oriented software developer, he’s highly interested and motivated in learning new concepts and technologies. Occasionally, he likes expanding his knowledge stack by creating interesting and demonstrative projects. Aside from work, he enjoys spending time with family and friends, whether it be a long walk or a table tennis session. For a long-term hobbies, he’s amused by travel and photography.

Sign in to get new blogs and news first.

Categories