title | summary |
---|---|
TiDB Workload Repository |
Introduces the workload repository system for collecting and storing historical workload data from a TiDB cluster. |
The workload repository is a system for collecting and storing historical workload data from a TiDB cluster. It periodically samples various system tables to track cluster performance and usage patterns over time.
To enable the Workload Repository, set the tidb_workload_repository_dest
system variable:
SET GLOBAL tidb_workload_repository_dest = 'table';
To disable it:
SET GLOBAL tidb_workload_repository_dest = '';
The Workload Repository stores data in tables under the WORKLOAD_SCHEMA
database. It collects data via two different methods:
- The snapshot sampling process, which runs at configurable intervals, hourly by default, and can be triggered manually.
- The time-based sampling process, which runs at shorter intervals, typically every 5 seconds.
The snapshot sampling process, which runs every 15 minutes to 2 hours, samples data from various cumulative metrics tables. Snapshots are initiated from one of the TiDB nodes at the specified intervals, and the process is as follows:
- From the initiating node, a row is inserted into
WORKLOAD_SCHEMA.HIST_SNAPSHOTS
, capturing the snapshot ID, start and end timestamps, and server version details. - On each TiDB node, all rows from the source tables are copied to the corresponding target tables with the
HIST_
prefix. The copied data includes the original columns from the source tables plus additional columns for the timestamp, instance ID, and snapshot ID.
Note that the sampled tables return data specific to the TiDB node from which they are queried.
Data is sampled from the following tables:
Source table | Destination table | Description |
---|---|---|
TIDB_INDEX_USAGE |
HIST_TIDB_INDEX_USAGE |
Index usage statistics |
TIDB_STATEMENTS_STATS |
HIST_TIDB_STATEMENTS_STATS |
Statement statistics |
CLIENT_ERRORS_SUMMARY_BY_HOST |
HIST_CLIENT_ERRORS_SUMMARY_BY_HOST |
Client error summaries by host |
CLIENT_ERRORS_SUMMARY_BY_USER |
HIST_CLIENT_ERRORS_SUMMARY_BY_USER |
Client error summaries by user |
CLIENT_ERRORS_SUMMARY_GLOBAL |
HIST_CLIENT_ERRORS_SUMMARY_GLOBAL |
Client error summaries by global |
The snapshot sampling interval can be controlled with tidb_workload_repository_snapshot_interval
:
SET GLOBAL tidb_workload_repository_snapshot_interval = 900; -- set the interval to 15 minutes
Note that while the snapshot sampling process runs automatically based on the configured interval, you can also trigger a manual snapshot using the following SQL statement:
ADMIN CREATE WORKLOAD SNAPSHOT;
Manually triggering snapshots does not change the interval or timing of automatic snapshots.
The time-based sampling process samples data from various non-cumulative metrics tables at intervals ranging from 1 to 600 seconds.
When the time-base sampling process runs, all rows from the source tables are copied to the corresponding target tables with the HIST_
prefix. The copied data includes the original columns from the source tables plus additional columns for the timestamp and instance ID.
Unlike the snapshot sampling process, a row will not be added to the HIST_SNAPSHOTS
table.
Note that the sampled tables return data specific to the TiDB node from which they are queried.
Data is sampled from the following tables:
Source table | Destination table | Description |
---|---|---|
PROCESSLIST |
HIST_PROCESSLIST |
Active sessions |
DATA_LOCK_WAITS |
HIST_DATA_LOCK_WAITS |
Data lock waits |
TIDB_TRX |
HIST_TIDB_TRX |
Active transactions |
MEMORY_USAGE |
HIST_MEMORY_USAGE |
Memory usage |
DEADLOCKS |
HIST_DEADLOCKS |
Deadlock information |
The time-based sampling interval can be controlled with tidb_workload_repository_active_sampling_interval
:
SET GLOBAL tidb_workload_repository_active_sampling_interval = 20; -- set the interval to 20 seconds
Setting this global variable to 0
disables the time-based sampling process.
Historical data in the Workload Repository is retained for seven days by default. The system automatically purges data based on the retention period setting, using partitions for efficient data management.
By default the Workload Repository retains historical data for seven days, but the tidb_workload_repository_retention_days
variable can be used to control the length of this period. For example, to keep data for 30 days, run the following:
SET GLOBAL tidb_workload_repository_retention_days = 30;
A higher value for this variable allows for longer data retention, which might be beneficial for workload analysis, but will increase storage requirements.
- Enabling the Workload Repository might have a small performance impact on the system.
- Setting sampling intervals too short might increase system overhead.
- Setting retention days to
0
disables automatic purging of old data.
- Start with default settings and adjust based on your monitoring needs.
- Set reasonable retention periods based on your storage capacity.
- Monitor the size of the
WORKLOAD_SCHEMA
database. - Use longer sampling intervals in production environments to minimize overhead.