You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: dashboard/dashboard-monitoring.md
+64-44
Original file line number
Diff line number
Diff line change
@@ -21,31 +21,31 @@ If the TiDB cluster is deployed using TiUP, you can also view the Performance Ov
21
21
22
22
The Performance Overview dashboard orchestrates the metrics of TiDB, PD, and TiKV, and presents each of them in the following sections:
23
23
24
-
- Overview: Database time and SQL execution time summary. By checking different colors in the overview, you can quickly identify the database workload profile and the performance bottleneck.
24
+
-**Overview**: Database time and SQL execution time summary. By checking different colors in the overview, you can quickly identify the database workload profile and the performance bottleneck.
25
25
26
-
- Load profile: Key metrics and resource usage, including database QPS, connection information, the MySQL command types the application interacts with TiDB, database internal TSO and KV request OPS, and resource usage of the TiKV and TiDB.
26
+
-**Load profile**: Key metrics and resource usage, including database QPS, connection information, the MySQL command types the application interacts with TiDB, database internal TSO and KV request OPS, and resource usage of the TiKV and TiDB.
27
27
28
-
- Top-down latency breakdown: Query latency versus connection idle time ratio, query latency breakdown, TSO/KV request latency during execution, breakdown of write latency within TiKV.
28
+
-**Top-down latency breakdown**: Query latency versus connection idle time ratio, query latency breakdown, TSO/KV request latency during execution, breakdown of write latency within TiKV.
29
29
30
30
The following sections illustrate the metrics on the Performance Overview dashboard.
31
31
32
32
### Database Time by SQL Type
33
33
34
-
- database time: Total database time per second
35
-
- sql_type: Database time consumed by each type of SQL statements per second
34
+
-`database time`: Total database time per second
35
+
-`sql_type`: Database time consumed by each type of SQL statements per second
36
36
37
37
### Database Time by SQL Phase
38
38
39
-
- database time: Total database time per second
40
-
- get token/parse/compile/execute: Database time consumed in four SQL processing phases
39
+
-`database time`: Total database time per second
40
+
-`get token/parse/compile/execute`: Database time consumed in four SQL processing phases
41
41
42
42
The SQL execution phase is in green and other phases are in red on general. If non-green areas are large, it means much database time is consumed in other phases than the execution phase and further cause analysis is required.
43
43
44
44
### SQL Execute Time Overview
45
45
46
-
- execute time: Database time consumed during SQL execution per second
47
-
- tso_wait: Concurrent TSO waiting time per second during SQL execution
48
-
- kv request type: Time waiting for each KV request type per second during SQL execution. The total KV request wait time might exceed SQL execution time, because KV requests are concurrent.
46
+
-`execute time`: Database time consumed during SQL execution per second
47
+
-`tso_wait`: Concurrent TSO waiting time per second during SQL execution
48
+
-`kv request type`: Time waiting for each KV request type per second during SQL execution. The total KV request wait time might exceed SQL execution time, because KV requests are concurrent.
49
49
50
50
Green metrics stand for common KV write requests (such as prewrite and commit), blue metrics stand for common read requests, and metrics in other colors stand for unexpected situations which you need to pay attention to. For example, pessimistic lock KV requests are marked red and TSO waiting is marked dark brown.
51
51
@@ -77,50 +77,70 @@ Generally, `tso - request` divided by `tso - cmd` is the average size of TSO req
77
77
78
78
### Connection Count
79
79
80
-
- total: Number of connections to all TiDB instances
81
-
- active connections: Number of active connections to all TiDB instances
80
+
-`total`: Number of connections to all TiDB instances
81
+
-`active connections`: Number of active connections to all TiDB instances
82
82
- Number of connections to each TiDB instance
83
83
84
-
### TiDB CPU
84
+
### TiDB CPU/Memory
85
85
86
-
- avg: Average CPU utilization across all TiDB instances
87
-
- delta: Maximum CPU utilization of all TiDB instances minus minimum CPU utilization of all TiDB instances
88
-
- max: Maximum CPU utilization across all TiDB instances
86
+
-`CPU-Avg`: Average CPU utilization across all TiDB instances
87
+
-`CPU-Delta`: Maximum CPU utilization of all TiDB instances minus minimum CPU utilization of all TiDB instances
88
+
-`CPU-Max`: Maximum CPU utilization across all TiDB instances
89
+
-`CPU-Quota`: Number of CPU cores that can be used by TiDB
90
+
-`Mem-Max`: Maximum memory utilization across all TiDB instances
89
91
90
-
### TiKV CPU/IO MBps
92
+
### TiKV CPU/Memory
91
93
92
-
- CPU-Avg: Average CPU utilization of all TiKV instances
93
-
- CPU-Delta: Maximum CPU utilization of all TiKV instances minus minimum CPU utilization of all TiKV instances
94
-
- CPU-MAX: Maximum CPU utilization among all TiKV instances
95
-
- IO-Avg: Average MBps of all TiKV instances
96
-
- IO-Delt: Maximum MBps of all TiKV instances minus minimum MBps of all TiKV instances
97
-
- IO-MAX: Maximum MBps of all TiKV instances
94
+
-`CPU-Avg`: Average CPU utilization across all TiKV instances
95
+
-`CPU-Delta`: Maximum CPU utilization of all TiKV instances minus minimum CPU utilization of all TiKV instances
96
+
-`CPU-Max`: Maximum CPU utilization across all TiKV instances
97
+
-`CPU-Quota`: Number of CPU cores that can be used by TiKV
98
+
-`Mem-Max`: Maximum memory utilization across all TiKV instances
99
+
100
+
### PD CPU/Memory
101
+
102
+
-`CPU-Max`: Maximum CPU utilization across all PD instances
103
+
-`CPU-Quota`: Number of CPU cores that can be used by PD
104
+
-`Mem-Max`: Maximum memory utilization across all PD instances
105
+
106
+
### Read Traffic
107
+
108
+
-`TiDB -> Client`: The outbound traffic statistics from TiDB to the client
109
+
-`Rocksdb -> TiKV`: The data flow that TiKV retrieves from RocksDB during read operations within the storage layer
110
+
111
+
### Write Traffic
112
+
113
+
-`Client -> TiDB`: The inbound traffic statistics from the client to TiDB
114
+
-`TiDB -> TiKV: general`: The rate at which foreground transactions are written from TiDB to TiKV
115
+
-`TiDB -> TiKV: internal`: The rate at which internal transactions are written from TiDB to TiKV
116
+
-`TiKV -> Rocksdb`: The flow of write operations from TiKV to RocksDB
117
+
-`RocksDB Compaction`: The total read and write I/O flow generated by RocksDB compaction operations
98
118
99
119
### Duration
100
120
101
-
- Duration: Execution time
121
+
-`Duration`: Execution time
102
122
103
123
- The duration from receiving a request from the client to TiDB till TiDB executing the request and returning the result to the client. In general, client requests are sent in the form of SQL statements; however, this duration can include the execution time of commands such as `COM_PING`, `COM_SLEEP`, `COM_STMT_FETCH`, and `COM_SEND_LONG_DATA`.
104
124
- TiDB supports Multi-Query, which means the client can send multiple SQL statements at one time, such as `select 1; select 1; select 1;`. In this case, the total execution time of this query includes the execution time of all SQL statements.
105
125
106
-
- avg: Average time to execute all requests
107
-
-99: P99 duration to execute all requests
108
-
- avg by type: Average time to execute all requests in all TiDB instances, collected by type: `SELECT`, `INSERT`, and `UPDATE`
126
+
-`avg`: Average time to execute all requests
127
+
-`99`: P99 duration to execute all requests
128
+
-`avg by type`: Average time to execute all requests in all TiDB instances, collected by type: `SELECT`, `INSERT`, and `UPDATE`
109
129
110
130
### Connection Idle Duration
111
131
112
132
Connection Idle Duration indicates the duration of a connection being idle.
113
133
114
-
- avg-in-txn: Average connection idle duration when the connection is within a transaction
115
-
- avg-not-in-txn: Average connection idle duration when the connection is not within a transaction
116
-
- 99-in-txn: P99 connection idle duration when the connection is within a transaction
117
-
- 99-not-in-txn: P99 connection idle duration when the connection is not within a transaction
134
+
-`avg-in-txn`: Average connection idle duration when the connection is within a transaction
135
+
-`avg-not-in-txn`: Average connection idle duration when the connection is not within a transaction
136
+
-`99-in-txn`: P99 connection idle duration when the connection is within a transaction
137
+
-`99-not-in-txn`: P99 connection idle duration when the connection is not within a transaction
118
138
119
139
### Parse Duration, Compile Duration, and Execute Duration
120
140
121
-
- Parse Duration: Time consumed in parsing SQL statements
122
-
- Compile Duration: Time consumed in compiling the parsed SQL AST to execution plans
123
-
- Execution Duration: Time consumed in executing execution plans of SQL statements
141
+
-`Parse Duration`: Time consumed in parsing SQL statements
142
+
-`Compile Duration`: Time consumed in compiling the parsed SQL AST to execution plans
143
+
-`Execution Duration`: Time consumed in executing execution plans of SQL statements
124
144
125
145
All these three metrics include the average duration and the 99th percentile duration in all TiDB instances.
126
146
@@ -134,25 +154,25 @@ Average time consumed in executing gRPC requests in all TiKV instances based on
134
154
135
155
### PD TSO Wait/RPC Duration
136
156
137
-
- wait - avg: Average time in waiting for PD to return TSO in all TiDB instances
138
-
- rpc - avg: Average time from sending TSO requests to PD to receiving TSO in all TiDB instances
139
-
- wait - 99: P99 time in waiting for PD to return TSO in all TiDB instances
140
-
- rpc - 99: P99 time from sending TSO requests to PD to receiving TSO in all TiDB instances
157
+
-`wait - avg`: Average time in waiting for PD to return TSO in all TiDB instances
158
+
-`rpc - avg`: Average time from sending TSO requests to PD to receiving TSO in all TiDB instances
159
+
-`wait - 99`: P99 time in waiting for PD to return TSO in all TiDB instances
160
+
-`rpc - 99`: P99 time from sending TSO requests to PD to receiving TSO in all TiDB instances
141
161
142
162
### Storage Async Write Duration, Store Duration, and Apply Duration
143
163
144
-
- Storage Async Write Duration: Time consumed in asynchronous write
145
-
- Store Duration: Time consumed in store loop during asynchronously write
146
-
- Apply Duration: Time consumed in apply loop during asynchronously write
164
+
-`Storage Async Write Duration`: Time consumed in asynchronous write
165
+
-`Store Duration`: Time consumed in store loop during asynchronously write
166
+
-`Apply Duration`: Time consumed in apply loop during asynchronously write
147
167
148
168
All these three metrics include the average duration and P99 duration in all TiKV instances.
149
169
150
170
Average storage async write duration = Average store duration + Average apply duration
Copy file name to clipboardExpand all lines: performance-tuning-methods.md
+74-16
Original file line number
Diff line number
Diff line change
@@ -216,34 +216,92 @@ In this workload, only `ANALYZE` statements are running in the cluster:
216
216
- The total number of KV requests per second is 35.5 and the number of Cop requests per second is 9.3.
217
217
- Most of the KV processing time is spent on `Cop-internal_stats`, which indicates that the most time-consuming KV request is `Cop` from internal `ANALYZE` operations.
218
218
219
-
#### TiDB CPU, TiKV CPU, and IO usage
219
+
#### CPUand memory usage
220
220
221
-
In the TiDB CPU and TiKV CPU/IO MBps panels, you can observe the logical CPU usage and IO throughput of TiDB and TiKV, including average, maximum, and delta (maximum CPU usage minus minimum CPU usage), based on which you can determine the overall CPU usage of TiDBand TiKV.
221
+
In the CPU/Memory panels of TiDB, TiKV, and PD, you can monitor their respective logical CPU usage and memory consumption, such as average CPU, maximum CPU, delta CPU (maximum CPU usage minus minimum CPU usage), CPU quota, and maximum memory usage. Based on these metrics, you can determine the overall resource usage of TiDB, TiKV, and PD.
222
222
223
-
- Based on the `delta` value, you can determine if CPU usage in TiDB is unbalanced (usually accompanied by unbalanced application connections) and if there are read/write hot spots among the cluster.
224
-
- With an overview of TiDBand TiKV resource usage, you can quickly determine if there are resource bottlenecks in your cluster and whether TiKVor TiDB needs scale-out.
223
+
- Based on the `delta` value, you can determine if CPU usage in TiDB or TiKV is unbalanced. For TiDB, a high `delta` usually means unbalanced application connections among the TiDB instances; For TiKV, a high `delta` usually means there are read/write hot spots in the cluster.
224
+
- With an overview of TiDB, TiKV, and PD resource usage, you can quickly determine if there are resource bottlenecks in your cluster and whether TiKV, TiDB, or PD needs scale-out or scale-up.
225
225
226
-
**Example 1: High TiDB resource usage**
226
+
**Example 1: High TiKV resource usage**
227
227
228
-
In this workload, each TiDB and TiKV is configured with 8 CPUs.
228
+
In the following TPC-C workload, each TiDB and TiKV is configured with 16 CPUs. PD is configured with 4 CPUs.
229
229
230
-

230
+

231
231
232
-
- The average, maximum, and delta CPU usage of TiDB are 575%, 643%, and 136%, respectively.
233
-
- The average, maximum, and delta CPU usage of TiKV are 146%, 215%, and 118%, respectively. The average, maximum, and delta I/O throughput of TiKV are 9.06 MB/s, 19.7 MB/s, and 17.1 MB/s, respectively.
232
+
- The average, maximum, and delta CPU usage of TiDB are 761%, 934%, and 322%, respectively. The maximum memory usage is 6.86 GiB.
233
+
- The average, maximum, and delta CPU usage of TiKV are 1343%, 1505%, and 283%, respectively. The maximum memory usage is 27.1 GiB.
234
+
- The maximum CPU usage of PD is 59.1%. The maximum memory usage is 221 MiB.
234
235
235
-
Obviously, TiDB consumes more CPU, which is near the bottleneck threshold of 8 CPUs. It is recommended that you scale out the TiDB.
236
+
Obviously, TiKV consumes more CPU, which is expected because TPC-C is a write-heavy scenario. To improve performance, it is recommended to scale out TiKV.
236
237
237
-
**Example 2: High TiKV resource usage**
238
+
#### Data traffic
238
239
239
-
In the TPC-C workload below, each TiDB and TiKV is configured with 16 CPUs.
240
+
The read and write traffic panels offer insights into traffic patterns within your TiDB cluster, allowing you to monitor data flow from clients to the database and between internal components comprehensively.
240
241
241
-

242
+
- Read traffic
242
243
243
-
- The average, maximum, and delta CPU usage of TiDB are 883%, 962%, and 153%, respectively.
244
-
- The average, maximum, and delta CPU usage of TiKV are 1288%, 1360%, and 126%, respectively. The average, maximum, and delta I/O throughput of TiKV are 130 MB/s, 153 MB/s, and 53.7 MB/s, respectively.
244
+
-`TiDB -> Client`: the outbound traffic statistics from TiDB to the client
245
+
-`Rocksdb -> TiKV`: the data flow that TiKV retrieves from RocksDB during read operations within the storage layer
245
246
246
-
Obviously, TiKV consumes more CPU, which is expected because TPC-C is a write-heavy scenario. It is recommended that you scale out the TiKV to improve performance.
247
+
- Write traffic
248
+
249
+
-`Client -> TiDB`: the inbound traffic statistics from the client to TiDB
250
+
-`TiDB -> TiKV: general`: the rate at which foreground transactions are written from TiDB to TiKV
251
+
-`TiDB -> TiKV: internal`: the rate at which internal transactions are written from TiDB to TiKV
252
+
-`TiKV -> Rocksdb`: the flow of write operations from TiKV to RocksDB
253
+
-`RocksDB Compaction`: the total read and write I/O flow generated by RocksDB compaction operations. If `RocksDB Compaction` is significantly higher than `TiKV -> Rocksdb`, and your average row size is larger than 512 bytes, you can enable Titan to reduce the compaction I/O flow as follows, with min-blob-size set to `"512B"` or `"1KB"` and blob-file-compression set to `"zstd"`.
254
+
255
+
```toml
256
+
[rocksdb.titan]
257
+
enabled = true
258
+
[rocksdb.defaultcf.titan]
259
+
min-blob-size = "1KB"
260
+
blob-file-compression = "zstd"
261
+
```
262
+
263
+
**Example 1: Read and write traffic in the TPC-C workload**
264
+
265
+
The following is an example of read and write traffic in the TPC-C workload.
266
+
267
+
- Read traffic
268
+
269
+
- `TiDB -> Client`: 14.2 MB/s
270
+
- `Rocksdb -> TiKV`: 469 MB/s. Note that both read operations (`SELECT` statements) and write operations (`INSERT`, `UPDATE`, and `DELETE` statements) require reading data from RocksDB into TiKV before committing a transaction.
**Example 2: Write traffic before and after Titan is enabled**
283
+
284
+
The following example shows the performance changes before and after Titan is enabled. For an insert workload with 6 KB records, Titan significantly reduces write traffic and compaction I/O, enhancing overall performance and resource utilization of TiKV.
0 commit comments