How Redict Handles Common Unix Signals #
This document provides information about how Redict reacts to different POSIX signals such as SIGTERM
and SIGSEGV
.
SIGTERM and SIGINT #
The SIGTERM
and SIGINT
signals tell Redict to shut down gracefully. When the server receives this signal,
it does not immediately exit. Instead, it schedules
a shutdown similar to the one performed by the SHUTDOWN
command. The scheduled shutdown starts as soon as possible, specifically as long as the
current command in execution terminates (if any), with a possible additional
delay of 0.1 seconds or less.
If the server is blocked by a long-running Lua script,
kill the script with SCRIPT KILL
if possible. The scheduled shutdown will
run just after the script is killed or terminates spontaneously.
This shutdown process includes the following actions:
- If there are any replicas lagging behind in replication:
- Pause clients attempting to write with
CLIENT PAUSE
and theWRITE
option. - Wait up to the configured
shutdown-timeout
(default 10 seconds) for replicas to catch up with the master’s replication offset.
- Pause clients attempting to write with
- If a background child is saving the RDB file or performing an AOF rewrite, the child process is killed.
- If the AOF is active, Redict calls the
fsync
system call on the AOF file descriptor to flush the buffers on disk. - If Redict is configured to persist on disk using RDB files, a synchronous (blocking) save is performed. Since the save is synchronous, it doesn’t use any additional memory.
- If the server is daemonized, the PID file is removed.
- If the Unix domain socket is enabled, it gets removed.
- The server exits with an exit code of zero.
IF the RDB file can’t be saved, the shutdown fails, and the server continues to run in order to ensure no data loss.
Likewise, if the user just turned on AOF, and the server triggered the first AOF rewrite in order to create the initial AOF file but this file can’t be saved, the shutdown fails and the server continues to run.
No further attempt to shut down will be made unless a new SIGTERM
is received or the SHUTDOWN
command is issued.
The server waits for lagging replicas up to a configurable shutdown-timeout
,
10 seconds by default, before shutting down. This provides a best effort to
minimize the risk of data loss in a situation where no save points are
configured and AOF is deactivated.
SIGSEGV, SIGBUS, SIGFPE and SIGILL #
The following signals are handled as a Redict crash:
- SIGSEGV
- SIGBUS
- SIGFPE
- SIGILL
Once one of these signals is trapped, Redict stops any current operation and performs the following actions:
- Adds a bug report to the log file. This includes a stack trace, dump of registers, and information about the state of clients.
- A fast memory test is performed as a first check of the reliability of the crashing system.
- If the server was daemonized, the PID file is removed.
- Finally the server unregisters its own signal handler for the received signal and resends the same signal to itself to make sure that the default action is performed, such as dumping the core on the file system.
What happens when a child process gets killed #
When the child performing the Append Only File rewrite gets killed by a signal, Redict handles this as an error and discards the (probably partial or corrupted) AOF file. It will attempt the rewrite again later.
When the child performing an RDB save is killed, Redict handles the condition as a more severe error. While the failure of an AOF file rewrite can cause AOF file enlargement, failed RDB file creation reduces durability.
As a result of the child producing the RDB file being killed by a signal, or when the child exits with an error (non zero exit code), Redict enters a special error condition where no further write command is accepted.
- Redict will continue to reply to read commands.
- Redict will reply to all write commands with a
MISCONFIG
error.
This error condition will persist until it becomes possible to create an RDB file successfully.
Kill the RDB file without errors #
Sometimes the user may want to kill the RDB-saving child process without
generating an error. This can be done using the signal SIGUSR1
. This signal
is handled in a special way: it kills the child process like any other signal,
but the parent process will not detect this as a critical error and will
continue to serve write requests.