Bulk Loading

Writing Data in Bulk Using the Redict Protocol #

Bulk loading is the process of loading Redict with a large amount of pre-existing data. This document describes how to bulk-load data in Redict quickly and efficiently.

Bulk loading using the redict-cli #

Using a normal Redict client to perform bulk loading is not a good idea for a few reasons: the naive approach of sending one command after the other is slow due to the round-trip time for every command. It is possible to use pipelining, but for bulk loading of many records, new commands need to be written while replies are read.

Not all clients support non-blocking I/O, and not all the clients can parse the replies efficiently to maximize throughput. For all of these reasons, the preferred way to mass import data into Redict is to generate a text file containing the Redict commands to insert the required data and send them using redict-cli.

For example, to import numerous key/value pairs in the form KeyN and ValueN, the file would look like this:

SET Key0 Value0
SET Key1 Value1
...
SET KeyN ValueN

The redict-cli pipe mode is designed to perform bulk loading. It expects to receive the commands via STDIN:

cat data.txt | redict-cli --pipe

That will produce an output similar to this:

All data transferred. Waiting for the last reply...
Last reply received from server.
errors: 0, replies: 1000000

The redict-cli utility will also make sure to only redirect errors received from the Redict instance to the standard output.

How the pipe mode works under the hood #

The pipe mode of redict-cli is designed to be as fast as possible and still be able to understand when the last reply was sent by the server at the same time.

This is obtained in the following way:

  • redict-cli --pipe tries to send data as fast as possible to the server.
  • At the same time, it reads data when it is available, trying to parse it.
  • Once there is no more data to read from STDIN, it sends a special ECHO command with a random 20-byte string. This ensures the latest command was received by the server by checking if the client received the same 20 bytes as a bulk reply. When the matching reply is reached, it exits successfully.
  • While parsing the replies, it takes a counter of all the replies parsed so that, at the end, it can output the number of commands transferred to the server.

Redict logo courtesy of @janWilejan, CC-BY-SA-4.0. Download SVG ⤑

Portions of this website courtesy of Salvatore Sanfilippo, CC-BY-SA-4.0.