FAQ

FAQ #

How is Redict different from other key-value stores? #

  • Redict has a different evolution path in the key-value DBs where values can contain more complex data types, with atomic operations defined on those data types. Redict data types are closely related to fundamental data structures and are exposed to the programmer as such, without additional abstraction layers.
  • Redict is an in-memory but persistent on disk database, so it represents a different trade off where very high write and read speed is achieved with the limitation of data sets that can’t be larger than memory. Another advantage of in-memory databases is that the memory representation of complex data structures is much simpler to manipulate compared to the same data structures on disk, so Redict can do a lot with little internal complexity. At the same time the two on-disk storage formats (RDB and AOF) don’t need to be suitable for random access, so they are compact and always generated in an append-only fashion (Even the AOF log rotation is an append-only operation, since the new version is generated from the copy of data in memory). However this design also involves different challenges compared to traditional on-disk stores. Being the main data representation on memory, Redict operations must be carefully handled to make sure there is always an updated version of the data set on disk.

What’s the Redict memory footprint? #

To give you a few examples (all obtained using 64-bit instances):

  • An empty instance uses ~ 3MB of memory.
  • 1 Million small Keys -> String Value pairs use ~ 85MB of memory.
  • 1 Million Keys -> Hash value, representing an object with 5 fields, use ~ 160 MB of memory.

Testing your use case is trivial. Use the redict-benchmark utility to generate random data sets then check the space used with the INFO memory command.

64-bit systems will use considerably more memory than 32-bit systems to store the same keys, especially if the keys and values are small. This is because pointers take 8 bytes in 64-bit systems. But of course the advantage is that you can have a lot of memory in 64-bit systems, so in order to run large Redict servers a 64-bit system is more or less required. The alternative is sharding.

Why does Redict keep its entire dataset in memory? #

In the past the Redict developers experimented with Virtual Memory and other systems in order to allow larger than RAM datasets, but after all we are very happy if we can do one thing well: data served from memory, disk used for storage. So for now there are no plans to create an on disk backend for Redict. Most of what Redict is, after all, a direct result of its current design.

If your real problem is not the total RAM needed, but the fact that you need to split your data set into multiple Redict instances, please read the partitioning page in this documentation for more info.

Can you use Redict with a disk-based database? #

Yes, a common design pattern involves taking very write-heavy small data in Redict (and data you need the Redict data structures to model your problem in an efficient way), and big blobs of data into an SQL or eventually consistent on-disk database. Similarly sometimes Redict is used in order to take in memory another copy of a subset of the same data stored in the on-disk database. This may look similar to caching, but actually is a more advanced model since normally the Redict dataset is updated together with the on-disk DB dataset, and not refreshed on cache misses.

How can I reduce Redict’s overall memory usage? #

A good practice is to consider memory consumption when mapping your logical data model to the physical data model within Redict. These considerations include using specific data types, key patterns, and normalization.

Beyond data modeling, there is more info in the Memory Optimization page.

What happens if Redict runs out of memory? #

Redict has built-in protections allowing the users to set a max limit on memory usage, using the maxmemory option in the configuration file to put a limit to the memory Redict can use. If this limit is reached, Redict will start to reply with an error to write commands (but will continue to accept read-only commands).

You can also configure Redict to evict keys when the max memory limit is reached. See the eviction policy docs for more information on this.

Background saving fails with a fork() error on Linux? #

Short answer: echo 1 > /proc/sys/vm/overcommit_memory :)

And now the long one:

The Redict background saving schema relies on the copy-on-write semantic of the fork system call in modern operating systems: Redict forks (creates a child process) that is an exact copy of the parent. The child process dumps the DB on disk and finally exits. In theory the child should use as much memory as the parent being a copy, but actually thanks to the copy-on-write semantic implemented by most modern operating systems the parent and child process will share the common memory pages. A page will be duplicated only when it changes in the child or in the parent. Since in theory all the pages may change while the child process is saving, Linux can’t tell in advance how much memory the child will take, so if the overcommit_memory setting is set to zero the fork will fail unless there is as much free RAM as required to really duplicate all the parent memory pages. If you have a Redict dataset of 3 GB and just 2 GB of free memory it will fail.

Setting overcommit_memory to 1 tells Linux to relax and perform the fork in a more optimistic allocation fashion, and this is indeed what you want for Redict.

You can refer to the proc(5) man page for explanations of the available values.

Are Redict on-disk snapshots atomic? #

Yes, the Redict background saving process is always forked when the server is outside of the execution of a command, so every command reported to be atomic in RAM is also atomic from the point of view of the disk snapshot.

How can Redict use multiple CPUs or cores? #

It’s not very frequent that CPU becomes your bottleneck with Redict, as usually Redict is either memory or network bound. For instance, when using pipelining a Redict instance running on an average Linux system can deliver 1 million requests per second, so if your application mainly uses O(N) or O(log(N)) commands, it is hardly going to use too much CPU.

However, to maximize CPU usage you can start multiple instances of Redict in the same box and treat them as different servers. At some point a single box may not be enough anyway, so if you want to use multiple CPUs you can start thinking of some way to shard earlier.

You can find more information about using multiple Redict instances on the Cluster page.

What is the maximum number of keys a single Redict instance can hold? What is the maximum number of elements in a Hash, List, Set, and Sorted Set? #

Redict can handle up to 2^32 keys, and was tested in practice to handle at least 250 million keys per instance.

Every hash, list, set, and sorted set, can hold 2^32 elements.

In other words your limit is likely the available memory in your system.

Why does my replica have a different number of keys its master instance? #

If you use keys with limited time to live (Redict expires) this is normal behavior. This is what happens:

  • The primary generates an RDB file on the first synchronization with the replica.
  • The RDB file will not include keys already expired in the primary but which are still in memory.
  • These keys are still in the memory of the Redict primary, even if logically expired. They’ll be considered non-existent, and their memory will be reclaimed later, either incrementally or explicitly on access. While these keys are not logically part of the dataset, they are accounted for in the INFO output and in the DBSIZE command.
  • When the replica reads the RDB file generated by the primary, this set of keys will not be loaded.

Because of this, it’s common for users with many expired keys to see fewer keys in the replicas. However, logically, the primary and replica will have the same content.

Redict logo courtesy of @janWilejan, CC-BY-SA-4.0. Download SVG ⤑

Portions of this website courtesy of Salvatore Sanfilippo, CC-BY-SA-4.0.