Skip to content

Commit 82bc46e

Browse files
committed
feat: add valkey-status
1 parent 86e00ff commit 82bc46e

File tree

7 files changed

+1022
-0
lines changed

7 files changed

+1022
-0
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ Monitoring Plugins:
3030
* atlassian-statuspage: receive alerts on incidents on a specific Atlassian Statuspage
3131
* deb-updates: checks for software updates on systems that use package management systems based on the `apt-get` command
3232
* kubectl-get-pods: checks the health and status of kubernetes pods by running `kubectl get pods` and parsing the results
33+
* valkey-status: returns information and statistics about a Valkey server
3334
* valkey-version: tracks if Valkey is EOL
3435

3536

Lines changed: 174 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,174 @@
1+
Check valkey-status
2+
===================
3+
4+
Overview
5+
--------
6+
7+
Returns information and statistics about a Valkey server. Alerts on memory consumption, memory fragmentation, hit rates and more. Connects to Valkey via 127.0.0.1:6379 by default.
8+
9+
Hints:
10+
11+
* Tested on Valkey 8.0.
12+
* "I'm here to keep you safe, Sam. I want to help you." comes from the character GERTY in the movie "Moon" (2009).
13+
14+
15+
Fact Sheet
16+
----------
17+
18+
.. csv-table::
19+
:widths: 30, 70
20+
21+
"Check Plugin Download", "https://github.com/Linuxfabrik/monitoring-plugins/tree/main/check-plugins/valkey-status"
22+
"Check Interval Recommendation", "Once a minute"
23+
"Can be called without parameters", "Yes"
24+
"Compiled for Windows", "No"
25+
"Requirements", "command-line tool ``valkey-cli``"
26+
27+
28+
Help
29+
----
30+
31+
.. code-block:: text
32+
33+
usage: valkey-status [-h] [-V] [--always-ok] [-c CRIT] [-H HOSTNAME]
34+
[--ignore-maxmemory0] [--ignore-overcommit]
35+
[--ignore-somaxconn] [--ignore-sync-partial-err]
36+
[--ignore-thp] [-p PASSWORD] [--port PORT]
37+
[--socket SOCKET] [--test TEST] [--tls] [-w WARN]
38+
39+
Returns information and statistics about a Valkey server. Alerts on memory
40+
consumption, memory fragmentation, hit rates and more.
41+
42+
options:
43+
-h, --help show this help message and exit
44+
-V, --version show program's version number and exit
45+
--always-ok Always returns OK.
46+
-c, --critical CRIT Set the CRIT threshold as a percentage. Default: >=
47+
None
48+
-H, --hostname HOSTNAME
49+
Valkey server hostname. Default: 127.0.0.1
50+
--ignore-maxmemory0 Don't warn about valkey' maxmemory=0. Default: False
51+
--ignore-overcommit Don't warn about vm.overcommit_memory<>1. Default:
52+
False
53+
--ignore-somaxconn Don't warn about net.core.somaxconn <
54+
net.ipv4.tcp_max_syn_backlog. Default: False
55+
--ignore-sync-partial-err
56+
Don't warn about partial sync errors (because if you
57+
have an asynchronous replication, a small number of
58+
"denied partial resync requests" might be normal).
59+
Default: False
60+
--ignore-thp Don't warn about transparent huge page setting.
61+
Default: False
62+
-p, --password PASSWORD
63+
Password to use when connecting to the valkey server.
64+
--port PORT Valkey server port. Default: 6379
65+
--socket SOCKET Valkey server socket (overrides hostname and port).
66+
--test TEST For unit tests. Needs "path-to-stdout-file,path-to-
67+
stderr-file,expected-retc".
68+
--tls Establish a secure TLS connection to Valkey.
69+
-w, --warning WARN Set the WARN threshold as a percentage. Default: >= 90
70+
71+
72+
Usage Examples
73+
--------------
74+
75+
.. code-block:: bash
76+
77+
./valkey-status --ignore-maxmemory0 --ignore-overcommit --ignore-somaxconn --ignore-sync-partial-err --ignore-thp
78+
79+
Output:
80+
81+
.. code-block:: text
82+
83+
Valkey v8.0.3 (based on Redis v7.2.4), standalone mode on 127.0.0.1:6379, /etc/valkey/valkey.conf, up 52m 17s, unlimited memory usage enabled, 0.0% memory usage (959.1KiB/3.8GiB, 959.1KiB peak, 14.5MiB RSS), maxmemory-policy=noeviction, 0.0 evicted keys, 0.0 expired keys, hit rate 0% (0.0 hits, 0.0 misses), vm.overcommit_memory is not set to 1, kernel transparent_hugepage is not set to "madvise" or "never"
84+
85+
86+
States
87+
------
88+
89+
* WARN or CRIT in case of memory usage above the specified thresholds
90+
* WARN on Valkey' ``maxmemory 0`` setting (can be disabled)
91+
* WARN on any memory issues (can be disabled)
92+
* WARN on partial sync errors (can be disabled)
93+
* WARN on bad OS configuration (can be disabled)
94+
95+
96+
Perfdata / Metrics
97+
------------------
98+
99+
Latest info can be found `here <https://valkey.io/commands/info/>`_.
100+
101+
.. csv-table::
102+
:widths: 25, 15, 60
103+
:header-rows: 1
104+
105+
Name, Type, Description
106+
clients_blocked_clients, Number, Number of clients pending on a blocking call
107+
clients_connected_clients, Number, Number of client connections (excluding connections from replicas)
108+
cpu_used_cpu_sys, Number, "System CPU consumed by the Valkey server, which is the sum of system CPU consumed by all threads of the server process (main thread and background threads)"
109+
cpu_used_cpu_sys_children, Number, System CPU consumed by the background processes
110+
cpu_used_cpu_user, Number, "User CPU consumed by the Valkey server, which is the sum of user CPU consumed by all threads of the server process (main thread and background threads)"
111+
cpu_used_cpu_user_children, Number, User CPU consumed by the background processes
112+
db_count, Number, Number of Valkey databases
113+
key_count, Number, Sum of all keys across all databases
114+
keyspace_<dbname>_keys, Number, The number of keys
115+
keyspace_<dbname>_expires, Number, The number of keys with an expiration
116+
keyspace_<dbname>_avg_ttl, Seonds,
117+
keyspace_hit_rate, Percentage, "Percentage of key lookups that are successfully returned by keys in your Valkey instance. Generally speaking, a higher cache-hit ratio is better than a lower cache-hit ratio. You should make a note of your cache-hit ratio before you make any large configuration changes such as adjusting the maxmemory-gb limit, changing your eviction policy, or scaling your instance. Then, after you modify your instance, check the cache-hit ratio again to see how your change impacted this metric."
118+
mem_usage, Percentage, "Indicates how close your working set size is to reaching the maxmemory-gb limit. Unless the eviction policy is set to no-eviction, the instance data reaching maxmemory does not always indicate a problem. However, key eviction is a background process that takes time. If you have a high write-rate, you could run out of memory before Valkey has time to evict keys to free up space."
119+
memory_maxmemory, Bytes,
120+
memory_mem_fragmentation_ratio, Number, "Ratio between used_memory_rss and used_memory. Note that this doesn't only includes fragmentation, but also other process overheads (see the allocator\_\* metrics), and also overheads like code, shared libraries, stack, etc. Memory fragmentation can cause your Memorystore instance to run out of memory even when the used memory to maxmemory-gb ratio is low. Memory fragmentation happens when the operating system allocates memory pages which Valkey cannot fully utilize after repeated write and delete operations. The accumulation of such pages can result in the system running out of memory and eventually causes the Valkey server to crash."
121+
memory_total_system_memory, Bytes, The total amount of memory that the Valkey host has
122+
memory_used_memory, Bytes, "Total number of bytes allocated by Valkey using its allocator (either standard libc, jemalloc, or an alternative allocator such as tcmalloc)"
123+
memory_used_memory_lua, Bytes, Number of bytes used by the Lua engine
124+
memory_used_memory_rss, Bytes, Number of bytes that Valkey allocated as seen by the operating system (a.k.a resident set size). This is the number reported by tools such as top(1) and ps(1)
125+
persistance_aof_current_rewrite_time_sec, Seconds, Duration of the on-going AOF rewrite operation if any
126+
persistance_aof_rewrite_in_progress, Number, Flag indicating a AOF rewrite operation is on-going
127+
persistance_aof_rewrite_scheduled, Number, Flag indicating an AOF rewrite operation will be scheduled once the on-going RDB save is complete.
128+
persistance_loading, Number, Flag indicating if the load of a dump file is on-going
129+
persistance_rdb_bgsave_in_progress, Number, Flag indicating a RDB save is on-going
130+
persistance_rdb_changes_since_last_save, Number, Number of changes since the last dump
131+
persistance_rdb_current_bgsave_time_sec, Seconds, Duration of the on-going RDB save operation if any
132+
replication_connected_slaves, Number, Number of connected replicas
133+
replication_repl_backlog_histlen, Bytes, Size in bytes of the data in the replication backlog buffer
134+
replication_repl_backlog_size, Bytes, Total size in bytes of the replication backlog buffer
135+
server_uptime_in_seconds, Seconds, Number of seconds since Valkey server start
136+
stats_evicted_keys, Continous Counter, Number of evicted keys due to maxmemory limit
137+
stats_expired_keys, Continous Counter, "Total number of key expiration events. If there are no expirable keys, it can be an indication that you are not setting TTLs on keys. In such cases, when your instance data reaches the maxmemory-gb limit, there are no keys to evict which can result in an out of memory condition. If the metric shows many expired keys, but you still see memory pressure on your instance, you should lower maxmemory-gb."
138+
stats_instantaneous_input, Number, The network read rate per second in KB/sec
139+
stats_instantaneous_ops_per_sec, Number, Number of commands processed per second
140+
stats_instantaneous_output, Number, The networks write rate per second in KB/sec
141+
stats_keyspace_hits, Number, Number of successful lookup of keys in the main dictionary
142+
stats_keyspace_misses, Number, Number of failed lookup of keys in the main dictionary
143+
stats_latest_fork_usec, Number, Duration of the latest fork operation in microseconds
144+
stats_migrate_cached_sockets, Number, The number of sockets open for MIGRATE purposes
145+
stats_pubsub_channels, Number, Global number of pub/sub channels with client subscriptions
146+
stats_pubsub_patterns, Number, Global number of pub/sub pattern with client subscriptions
147+
stats_rejected_connections, Number, Number of connections rejected because of maxclients limit
148+
stats_sync_full, Number, The number of full resyncs with replicas
149+
stats_sync_partial_err, Number, The number of denied partial resync requests
150+
stats_sync_partial_ok, Number, The number of accepted partial resync requests
151+
stats_total_commands_processed, Number, Total number of commands processed by the server
152+
stats_total_connections_received, Number, Total number of connections accepted by the server
153+
stats_total_net_input_bytes, Bytes, The total number of bytes read from the network
154+
stats_total_net_output_bytes, Bytes, The total number of bytes written to the network
155+
156+
157+
Troubleshooting
158+
---------------
159+
160+
vm.overcommit_memory is not set to 1
161+
``sysctl -w vm.overcommit_memory=1``
162+
163+
kernel transparent_hugepage is not set to "madvise"
164+
``echo madvise > /sys/kernel/mm/transparent_hugepage/enabled``
165+
166+
net.core.somaxconn is lower than net.ipv4.tcp_max_syn_backlog
167+
``tcp_max_syn_backlog`` represents the maximal number of connections in ``SYN_RECV`` queue. ``somaxconn`` represents the maximal size of ``ESTABLISHED`` queue and should be greater than ``tcp_max_syn_backlog``, so do something like this: ``sysctl -w net.core.somaxconn=1024; sysctl -w net.ipv4.tcp_max_syn_backlog=512``
168+
169+
170+
Credits, License
171+
----------------
172+
173+
* Authors: `Linuxfabrik GmbH, Zurich <https://www.linuxfabrik.ch>`_
174+
* License: The Unlicense, see `LICENSE file <https://unlicense.org/>`_.

0 commit comments

Comments
 (0)