Executing Lua scripts

Executing Lua scripts in Redict #

Redict lets users upload and execute Lua scripts on the server. Scripts can employ programmatic control structures and use most of the commands while executing to access the database. Because scripts execute in the server, reading and writing data from scripts is very efficient.

Redict guarantees the script’s atomic execution. While executing the script, all server activities are blocked during its entire runtime. These semantics mean that all of the script’s effects either have yet to happen or had already happened.

Scripting offers several properties that can be valuable in many cases. These include:

  • Providing locality by executing logic where data lives. Data locality reduces overall latency and saves networking resources.
  • Blocking semantics that ensure the script’s atomic execution.
  • Enabling the composition of simple capabilities that are either missing from Redict or are too niche to be a part of it.

Lua lets you run part of your application logic inside Redict. Such scripts can perform conditional updates across multiple keys, possibly combining several different data types atomically.

Scripts are executed in Redict by an embedded execution engine. Presently, Redict supports a single scripting engine, the Lua 5.1 interpreter. Please refer to the Redict Lua API Reference page for complete documentation.

Although the server executes them, Eval scripts are regarded as a part of the client-side application, which is why they’re not named, versioned, or persisted. So all scripts may need to be reloaded by the application at any time if missing (after a server restart, fail-over to a replica, etc.). Redict Functions offer an alternative approach to programmability which allow the server itself to be extended with additional programmed logic.

Getting started #

We’ll start scripting with Redict by using the EVAL command.

Here’s our first example:

> EVAL "return 'Hello, scripting!'" 0
"Hello, scripting!"

In this example, EVAL takes two arguments. The first argument is a string that consists of the script’s Lua source code. The script doesn’t need to include any definitions of Lua function. It is just a Lua program that will run in the Redict engine’s context.

The second argument is the number of arguments that follow the script’s body, starting from the third argument, representing Redict key names. In this example, we used the value 0 because we didn’t provide the script with any arguments, whether the names of keys or not.

Script parameterization #

It is possible, although highly ill-advised, to have the application dynamically generate script source code per its needs. For example, the application could send these two entirely different, but at the same time perfectly identical scripts:

redict> EVAL "return 'Hello'" 0
"Hello"
redict> EVAL "return 'Scripting!'" 0
"Scripting!"

Although this mode of operation isn’t blocked by Redict, it is an anti-pattern due to script cache considerations (more on the topic below). Instead of having your application generate subtle variations of the same scripts, you can parametrize them and pass any arguments needed for to execute them.

The following example demonstrates how to achieve the same effects as above, but via parameterization:

redict> EVAL "return ARGV[1]" 0 Hello
"Hello"
redict> EVAL "return ARGV[1]" 0 Parameterization!
"Parameterization!"

At this point, it is essential to understand the distinction Redict makes between input arguments that are names of keys and those that aren’t.

While key names in Redict are just strings, unlike any other string values, these represent keys in the database. The name of a key is a fundamental concept in Redict and is the basis for operating the Redict Cluster.

Important: to ensure the correct execution of scripts, both in standalone and clustered deployments, all names of keys that a script accesses must be explicitly provided as input key arguments. The script should only access keys whose names are given as input arguments. Scripts should never access keys with programmatically-generated names or based on the contents of data structures stored in the database.

Any input to the function that isn’t the name of a key is a regular input argument.

In the example above, both Hello and Parameterization! regular input arguments for the script. Because the script doesn’t touch any keys, we use the numerical argument 0 to specify there are no key name arguments. The execution context makes arguments available to the script through KEYS and ARGV global runtime variables. The KEYS table is pre-populated with all key name arguments provided to the script before its execution, whereas the ARGV table serves a similar purpose but for regular arguments.

The following attempts to demonstrate the distribution of input arguments between the scripts KEYS and ARGV runtime global variables:

redict> EVAL "return { KEYS[1], KEYS[2], ARGV[1], ARGV[2], ARGV[3] }" 2 key1 key2 arg1 arg2 arg3
1) "key1"
2) "key2"
3) "arg1"
4) "arg2"
5) "arg3"

Note: as can been seen above, Lua’s table arrays are returned as RESP2 array replies, so it is likely that your client’s library will convert it to the native array data type in your programming language. Please refer to the rules that govern data type conversion for more pertinent information.

Interacting with Redict from a script #

It is possible to call Redict commands from a Lua script either via redict.call() or redict.pcall().

The two are nearly identical. Both execute a Redict command along with its provided arguments, if these represent a well-formed command. However, the difference between the two functions lies in the manner in which runtime errors (such as syntax errors, for example) are handled. Errors raised from calling redict.call() function are returned directly to the client that had executed it. Conversely, errors encountered when calling the redict.pcall() function are returned to the script’s execution context instead for possible handling.

For example, consider the following:

> EVAL "return redict.call('SET', KEYS[1], ARGV[1])" 1 foo bar
OK

The above script accepts one key name and one value as its input arguments. When executed, the script calls the SET command to set the input key, foo, with the string value “bar”.

Script cache #

Until this point, we’ve used the EVAL command to run our script.

Whenever we call EVAL, we also include the script’s source code with the request. Repeatedly calling EVAL to execute the same set of parameterized scripts, wastes both network bandwidth and also has some overheads in Redict. Naturally, saving on network and compute resources is key, so, instead, Redict provides a caching mechanism for scripts.

Every script you execute with EVAL is stored in a dedicated cache that the server keeps. The cache’s contents are organized by the scripts’ SHA1 digest sums, so the SHA1 digest sum of a script uniquely identifies it in the cache. You can verify this behavior by running EVAL and calling INFO afterward. You’ll notice that the used_memory_scripts_eval and number_of_cached_scripts metrics grow with every new script that’s executed.

As mentioned above, dynamically-generated scripts are an anti-pattern. Generating scripts during the application’s runtime may, and probably will, exhaust the host’s memory resources for caching them. Instead, scripts should be as generic as possible and provide customized execution via their arguments.

A script is loaded to the server’s cache by calling the SCRIPT LOAD command and providing its source code. The server doesn’t execute the script, but instead just compiles and loads it to the server’s cache. Once loaded, you can execute the cached script with the SHA1 digest returned from the server.

Here’s an example of loading and then executing a cached script:

redict> SCRIPT LOAD "return 'Immabe a cached script'"
"c664a3bf70bd1d45c4284ffebb65a6f2299bfc9f"
redict> EVALSHA c664a3bf70bd1d45c4284ffebb65a6f2299bfc9f 0
"Immabe a cached script"

Cache volatility #

The Redict script cache is always volatile. It isn’t considered as a part of the database and is not persisted. The cache may be cleared when the server restarts, during fail-over when a replica assumes the master role, or explicitly by SCRIPT FLUSH. That means that cached scripts are ephemeral, and the cache’s contents can be lost at any time.

Applications that use scripts should always call EVALSHA to execute them. The server returns an error if the script’s SHA1 digest is not in the cache. For example:

redict> EVALSHA ffffffffffffffffffffffffffffffffffffffff 0
(error) NOSCRIPT No matching script

In this case, the application should first load it with SCRIPT LOAD and then call EVALSHA once more to run the cached script by its SHA1 sum. Most of Redict’s clients already provide utility APIs for doing that automatically. Please consult your client’s documentation regarding the specific details.

!EVALSHA in the context of pipelining #

Special care should be given executing EVALSHA in the context of a pipelined request. The commands in a pipelined request run in the order they are sent, but other clients’ commands may be interleaved for execution between these. Because of that, the NOSCRIPT error can return from a pipelined request but can’t be handled.

Therefore, a client library’s implementation should revert to using plain EVAL of parameterized in the context of a pipeline.

Script cache semantics #

During normal operation, an application’s scripts are meant to stay indefinitely in the cache (that is, until the server is restarted or the cache being flushed). The underlying reasoning is that the script cache contents of a well-written application are unlikely to grow continuously. Even large applications that use hundreds of cached scripts shouldn’t be an issue in terms of cache memory usage.

The only way to flush the script cache is by explicitly calling the SCRIPT FLUSH command. Running the command will completely flush the scripts cache, removing all the scripts executed so far. Typically, this is only needed when the instance is going to be instantiated for another customer or application in a cloud environment.

Also, as already mentioned, restarting a Redict instance flushes the non-persistent script cache. However, from the point of view of the Redict client, there are only two ways to make sure that a Redict instance was not restarted between two different commands:

  • The connection we have with the server is persistent and was never closed so far.
  • The client explicitly checks the run_id field in the INFO command to ensure the server was not restarted and is still the same process.

Practically speaking, it is much simpler for the client to assume that in the context of a given connection, cached scripts are guaranteed to be there unless the administrator explicitly invoked the SCRIPT FLUSH command. The fact that the user can count on Redict to retain cached scripts is semantically helpful in the context of pipelining.

The !SCRIPT command #

The Redict SCRIPT provides several ways for controlling the scripting subsystem. These are:

  • SCRIPT FLUSH: this command is the only way to force Redict to flush the scripts cache. It is most useful in environments where the same Redict instance is reassigned to different uses. It is also helpful for testing client libraries’ implementations of the scripting feature.

  • SCRIPT EXISTS: given one or more SHA1 digests as arguments, this command returns an array of 1’s and 0’s. 1 means the specific SHA1 is recognized as a script already present in the scripting cache. 0’s meaning is that a script with this SHA1 wasn’t loaded before (or at least never since the latest call to SCRIPT FLUSH).

  • SCRIPT LOAD script: this command registers the specified script in the Redict script cache. It is a useful command in all the contexts where we want to ensure that EVALSHA doesn’t not fail (for instance, in a pipeline or when called from a MULTI/EXEC transaction), without the need to execute the script.

  • SCRIPT KILL: this command is the only way to interrupt a long-running script (a.k.a slow script), short of shutting down the server. A script is deemed as slow once its execution’s duration exceeds the configured maximum execution time threshold. The SCRIPT KILL command can be used only with scripts that did not modify the dataset during their execution (since stopping a read-only script does not violate the scripting engine’s guaranteed atomicity).

  • SCRIPT DEBUG: controls use of the built-in Redict Lua scripts debugger.

Script replication #

In standalone deployments, a single Redict instance called master manages the entire database. A clustered deployment has at least three masters managing the sharded database. Redict uses replication to maintain one or more replicas, or exact copies, for any given master.

Because scripts can modify the data, Redict ensures all write operations performed by a script are also sent to replicas to maintain consistency. Data-modifying commands executed by scripts are sent to replicas, which then run the same commands without executing any scripts.

Debugging Eval scripts #

Redict has support for native Lua debugging. The Redict Lua debugger is a remote debugger consisting of a server, which is Redict itself, and a client, which is by default redict-cli.

The Lua debugger is described in the Lua scripts debugging section of the Redict documentation.

Execution under low memory conditions #

When memory usage in Redict exceeds the maxmemory limit, the first write command encountered in the script that uses additional memory will cause the script to abort (unless redict.pcall was used).

However, an exception to the above is when the script’s first write command does not use additional memory, as is the case with (for example, DEL and LREM). In this case, Redict will allow all commands in the script to run to ensure atomicity. If subsequent writes in the script consume additional memory, Redict’ memory usage can exceed the threshold set by the maxmemory configuration directive.

Another scenario in which a script can cause memory usage to cross the maxmemory threshold is when the execution begins when Redict is slightly below maxmemory, so the script’s first write command is allowed. As the script executes, subsequent write commands consume more memory leading to the server using more RAM than the configured maxmemory directive.

In those scenarios, you should consider setting the maxmemory-policy configuration directive to any values other than noeviction. In addition, Lua scripts should be as fast as possible so that eviction can kick in between executions.

Note that you can change this behaviour by using flags.

Eval flags #

Normally, when you run an Eval script, the server does not know how it accesses the database. By default, Redict assumes that all scripts read and write data. However, there is a way to declare flags when creating a script in order to tell Redict how it should behave.

The way to do that is by using a Shebang statement on the first line of the script like so:

#!lua flags=no-writes,allow-stale
local x = redict.call('get','x')
return x

Note that as soon as Redict sees the #! comment, it’ll treat the script as if it declares flags, even if no flags are defined, it still has a different set of defaults compared to a script without a #! line.

Another difference is that scripts without #! can run commands that access keys belonging to different cluster hash slots, but ones with #! inherit the default flags, so they cannot.

Please refer to Script flags to learn about the various scripts and the defaults.

Redict logo courtesy of @janWilejan, CC-BY-SA-4.0. Download SVG ⤑

Portions of this website courtesy of Salvatore Sanfilippo, CC-BY-SA-4.0.