Skip to content

Commit 2902f46

Browse files
committed
[BACKPORT 2024.2][#24566] docdb: Throttle log when tablet is stuck in bootstrapping state
Summary: If a tablet is stuck in the bootstrapping state due to bugs like #20977, we will extensively log the following log on the tserver: ``` I1019 16:05:33.204833 112611 consensus_peers.cc:187] T e28f9138850f4bd29b6ee097d2557e12 P d494d8235d1444e0978ed79d7037ce5c -> Peer fb68a8df689340fabe86c775ba95f289 ([host: "x.x.x.x" port: 9100], []): Found a RPC call in stuck state - timeout: 3.000s, last_rpc_start_time: 125060143.166s, stuck threshold: 10.000s, force recover: 0, call state: OutboundCall(0x0000000db30567a0 -> RPC call yb.consensus.ConsensusService.UpdateConsensus -> { remote: x.x.x.x:9100 idx: 0 protocol: 0x000000000437b7b0 -> tcpc } , state=SENT.): RPC call yb.consensus.ConsensusService.UpdateConsensus -> { remote: x.x.x.x:9100 idx: 0 protocol: 0x000000000437b7b0 -> tcpc } , state=SENT., start_time: 125060143.166s, sent_time: 125060143.166s, callback_time: 0.000s, now: 125487372.619s, connection: 0x000000003010a3d8 -> Connection (0x000000003010a3d8) client x.x.x.x:39958 => x.x.x.x:9100 ``` This change throttles the log to once per second (it is DFATAL now, so it will fail immediately in debug, but we still do not want ERROR logs to fill up the disk if it finds its way to production clusters). Original commit: 87afcd8 / D50000 Test Plan: Jenkins Reviewers: mhaddad, hsunder Reviewed By: mhaddad Subscribers: ybase Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D50095
1 parent b8e7a24 commit 2902f46

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

src/yb/consensus/consensus_peers.cc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -185,7 +185,7 @@ Status Peer::SignalRequest(RequestTriggerMode trigger_mode) {
185185
auto last_rpc_start_time = last_rpc_start_time_.load(std::memory_order_acquire);
186186
if (last_rpc_start_time != CoarseTimePoint::min() &&
187187
now > last_rpc_start_time + stuck_threshold + timeout && !controller_.finished()) {
188-
LOG_WITH_PREFIX(INFO) << Format(
188+
YB_LOG_WITH_PREFIX_EVERY_N_SECS(INFO, 1) << Format(
189189
"Found an RPC call in stuck state - timeout: $0, last_rpc_start_time: $1, "
190190
"stuck threshold: $2, force recover: $3, call state: $4",
191191
timeout, last_rpc_start_time, stuck_threshold,

0 commit comments

Comments
 (0)