Deliver powerful search with Redis
Introduction
Redis is a great choice for implementing a highly available in-memory data structure store, that can be used as a database, cache, and message broker. Combine it with some of its modules, for example, RediSearch and RedisJSON and you get a full-text search engine.
In this post, we’ll dive into building our simple, yet effective and powerful search engine, starting from loading data into Redis, then performing text searches using a rich query language and finally aggregate or group query results to our specific needs.
Data that we’ll be using can vary. For the demo purposes we’ll use Bigfoot sightings data, which contains sighing data publicly available on the BFRO website in a more digestible form. We’ll use the geocoded version [1] since we want our search engine to be able to perform geospatial search as well. You can download data from here.
For the sake of clarity and concept, we’ll mainly focus on the logic rather than on the implementation. However, feel free to explore GitHub repository dedicated to this purpose, where you can easily dive deeper into POC. Once we’re familiar with the logic, all Redis commands can be transfigured and used with specific Redis client (e.g., Redis NodeJS client[2])
Setting up
There are multiple ways to set up a Redis, core, server running, along with its modules. Speaking of modules or add-ons, RediSearch and RedisJSON, enrich the Redis core data structures with search capability and the usage of modern data model like JSON.
You can manually configure Redis server and include only specific Redis modules, but for the demo purposes we’ll be using pre-built Docker image, redismod[3], that bundles couple of different Redis modules together, along with RediSearch and RedisJSON.
To get started with Redis and its modules, simply run the docker image.
Image 1 – running redismod docker image
Once the container is up and running, you can verify whether all bundled modules are loaded. From this point on we’ll have to communicate with Redis server, either by using a Redis CLI[4] or by using some of the available clients. The following sections rely on the usage of the Redis CLI.
Image 2 – verifying state of Redis modules
Loading and retrieving data
We haven’t mentioned that Redis is a key-value[5] store, supporting only strings as keys and a couple of different data structures as values[6]. Considering that we have multiple information for each Bigfoot sighting, the best approach would be to use JSON structure as a value.
Each key can be combination of “sighting” prefix and concrete sighting identification number (e.g., sighthing:2512).
To insert specific key-value data we can use JSON.SET command to bind a JSON value to a specific key. The following example contains several JSON fields, although data could be more complex and nested.
Image 3 – setting up a JSON data
Note that we’re specifying JSON value path (symbol $[7]) which represents the root of the path upon which data will be stored to.
Once the entire data collection is loaded, we can retrieve the keys using KEYS command to retrieve each recorded data.
Image 4 – getting list of Redis keys
To retrieve specific sighting data, we can use, similarly, JSON.GET command to get JSON value for a specific key.
Image 5 – getting JSON value for the specific key
Indexing data
Now real fun comes into play. We’re able to define multiple indexes[8] for our data, complex or not, and use them to traverse though data to quickly filter by specific criteria.
Our index could consist of multiple fields, where we could have, for example:
- a full-text field (e.g., title) – used to perform fuzzy, exact, or stemming-like search
- a tag field (e.g., state) – major difference from full-text field is that Redis doesn’t perform stemming on this type of field
- a geospatial field (e.g., location) – used to quickly filter data based on location, or more specifically on longitude and latitude information
We can build such index using FT. CREATE command.
Image 6 – creating Redis index
Querying data
With the created index, we’re able to perform search operations. We can use FT.SEARCH command and specify the query for which we want to perform search operation.
Image 7 – searching for a specific query
Let’s break a previous command into smaller pieces. Firstly, we’re querying sightings from Alabama state. Stemming is not applied since the field is marked as a tag, only basic lowercase, and uppercase matching is applied. Those sightings need to have a “river” and “bear” among full-text fields (we haven’t specified which ones). Finally, not all results will be returned, but a rather limited amount of the first 10 queried results, and for each result we’ll return only information about title and location.
Note that RETURN command, and most other commands following this section, require integer next to them marking the arguments count.
For more detailed introduction to query searching, you can visit this resource, where you can learn more about prefix, fuzzy matching, unions and other.
We can even address location when querying, saying that we want only those sightings near a radius of some specific location. Radius is defined either by imperial (miles – mi) or metric unites (kilometers – km).
Image 8 – performing geospatial search
The only question left is should we summarize or highlight our data? Maybe instead of long and non-informative queried title, we prefer our data visually summarized by the specific part of the data (in this case title) which contained the requested criteria. Summarization will fragment the resulted text field into smaller sized snippets and each snippet will contain the found term(s)[9] and some additional surrounding context.
Aside from summarization, we might also prefer result highlighting as well, which will highlight the found term(s) (and its stemming variants) with a user-defined tag. This is a neat option if we want to directly represent such data on a website and use surrounding bold or italic tags around each found term.
Image 9 – summarizing and highlighting resulted data
In the search above, we’re filtering sightings that contains “river” term (or any of its variants) and we’re returning limited, summarized, and highlighted title as a result. By default, highlight is automatically applying bold HTML tags, but we’re specifying that surrounding tags should be italic.
Aggregating data
It’s a rather common request to have data aggregated in some way before the results are collected. Luckily for us, Redis offers us a rich set of aggregation tools. We can apply numeric, string, date, and geo aggregation functions[10].
Let’s assume that we want to group queried sightings by year. Currently, our each of our sightings has date field measured as Unix epoch time. Luckily, there’s an aggregation function that extracts year from timestamp.
We can perform data aggregation by calling FT.AGGREGATE command. Aggregation command allows us to APPLY specific functions. Note that JSON data fields must be first loaded to have them available within the aggregation functions.
Image 10 – simple date aggregation
Like earlier, we’re only interested in the sightings containing term “river” among full-text fields. Once we have queried date, we’re applying the date aggregation function year which extracts the year of the specified timestamp. Aggregated data is then grouped by the year, where we count and distinct grouped values by the id. Finally, a basic sort is applied.
Conclusion
In a short brief, we’ve explained the core concepts on how you can deliver powerful search with Redis and its modules (RedisJSON and RediSearch). Along the way we’ve explained some of the commands for loading or retrieving data from Redis, as well as the commands used for searching and aggregating data.
As mentioned earlier, you can always explore GitHub repository which only expands Redis learning curve to a higher degree. There, you’ll be able to explore how we can use Redis client (specifically for NodeJS) to perform search and aggregation upon Bigfoot sightings.
Useful resources
Below are listed resources which can be found amusing and insightful for further knowledge expanding.
- https://try.redis.io – online Redis database demonstration
- https://redis.io/documentation – official Redis documentation
- https://redis.com/redis-enterprise/redis-insight – official GUI client for the Redis making it easier to navigate through your in-memory data
- https://developer.redis.com/develop – simplified tutorials to get you started for different Redis clients
- https://www.youtube.com/c/Redisinc – official Redis YouTube channel to make you stay up to date with latest features
[1] A geocoded version provides geographical coordinates corresponding to a specific location.
[2] Official and community driven Redis clients can be found here.
[3] Redismod contains multiple Redis modules making it a convenient way to quickly set up Redis.
[4] Redis CLI is the Redis command line interface, a simple program that allows to send commands to Redis, and read the replies sent by the server, directly from the terminal.
[5] Key-value store, or key-value database, is a simple database that uses an associative array as the fundamental data model.
[6] When it comes to values, Redis supports various data types which are listed here.
[7] Root JSON value path can be also represented with symbol “.”.
[8] Index is a data structure that improves the speed of data retrieval operations on a database at the cost of additional storage usage.
[9] Term is nothing but a word or set of words queried for the search.
[10] Aggregation function apply certain data transformation. Read more about it and about list of available functions here.