Fast autocomplete using Python and Redis
General information
Problem
Imagine situation when you need provide autocomplete for some data list. In case when you using SQL probably you will do something like this:select * from table_name where table_field like 'search%'
Source code can be found at:
https://github.com/JFF-Bohdan/console_fast_autocomplete
Disclaimer
This repository contains dictionaries found on some FTPs and other repos. They can be found indata folder.Solution
Base information
You can use Redis NoSQL to perform fast autocomplete. First of all you need load all data into Redis database and then give interface to perform sarch queries.Loading data
First of all we will useZADD command to load items into sorted set. We will specify same score 0.0 to each item, so them will sorted lexicographically in the set.To implement case intensive search we will use some little trick. Before adding each item to the set we will convert it to lowercase and will add original item after
: separator. So when we have want add word Wood to the set, actually we will add wood:Wood. In this case we will be able to perform case intensive search and also save original word.We will use pipelines (part of
redis Python library) to increase loading speed. They will help us to buffer ZADD
commands and reduce the number of back-and-forth TCP packets between
the client and server. They helps us to dramatically increase the
loading performance.Performing search
We will useZRANGEBYLEX
command and will load all strings which starts with query string. So
it's will be actual implementation of autocomplete. BTW, superfast
autocomplete: we can perform search for string alexa for 520679 items just in 30-35ms at my laptop.Installation
Just clone this repository using git and then and install dependancies using:pip install -r requirements.txtUsage
Configuration file
You can find example configuration file in:.\conf\default.confYou need specify connection to
redis, folder with data and temporary[main]
temp_path=./tmp
[redis]
host=localhost
port=6379
password=
db=0
[data]
main_source=./data/
redisconnection to redis;data::main_source- folder with data;main::temp_path- folder for.zipfiles unpacking
Initializing database
You can load all information from data files using:string_search.py --config .\conf\default.conf --init-dataprogram will look for all files in this folder. If
.zip file will be found, it will be uncompressed into temporary folder for further loading.This operation can took long time, up to 1 minute on slow machines.
Performing search
To perform search foralexa query you should use:string_search.py --config .\conf\default.conf --search alexaChecking items count
You can check count of items in database by using:string_search.py --config .\conf\default.conf --get-lengthEnjoy!
No comments:
Post a Comment