Conversation
The inner deletion loop in clear_stale_call_cache used a hardcoded batch size of 10,000. Replace it with AdaptiveBatchSize that starts at 100 and self-tunes based on actual query duration, avoiding both excessive lock contention on large caches and unnecessary round-trips on small ones.
Add catalog::table_has_index() to check for index existence, and use it in clear_stale_call_cache to ensure a btree index on contract_address exists before running deletion queries. The index is created concurrently if missing, avoiding sequential scans on large call_cache tables.
| conn.batch_execute(&format!( | ||
| "create index concurrently if not exists {} \ | ||
| on {}(contract_address)", | ||
| Self::CALL_CACHE_CONTRACT_ADDRESS_INDEX, | ||
| table_qname | ||
| )) | ||
| .await?; |
There was a problem hiding this comment.
Failure on create index concurrently may leave an invalid index, that will not be used, but will exist as a name on the table. We should probably match on error here and execute drop index if exists.
store/postgres/src/chain_store.rs
Outdated
| loop { | ||
| let current_size = batch_size.size; | ||
| let start = Instant::now(); | ||
| let deleted_count = sql_query(&delete_cache_query) |
There was a problem hiding this comment.
This is actually oversight on my part when implementing this (sorry about that 🙏🏻), I would be nice to add logs before/after the delete query to show progress.
store/postgres/src/chain_store.rs
Outdated
| // self-tune based on query duration. | ||
| let contracts_batch_size: i64 = 2000; | ||
| let cache_batch_size: usize = 10000; | ||
| let mut batch_size = AdaptiveBatchSize::with_size(100); |
There was a problem hiding this comment.
It may be an edge case (depending on shard/replica setup), but now that batch_size is unbounded, couldn't it grow to a point it creates very large WAL files per batch, which may cause replication lag on read replicas and affect query performance? (tbf 10k already probably had significant WAL/lag impact)
There was a problem hiding this comment.
I don't think it's a big deal, at least if the adaptive batch size works since it will try to keep transactions to under 3 minutes
Refactor the code to separate the gymnastics caused by the two separate storage schemes from the overall control loop
Introduce an env variable to control that, and lower the default to 100
dimitrovmaksim
left a comment
There was a problem hiding this comment.
Approving not to block on nits
| ); | ||
| break; | ||
| } | ||
| if !idx_valid { |
There was a problem hiding this comment.
nit: This would trigger on a missing index as well, so the log may be a little misleading
There was a problem hiding this comment.
Nice catch. I'll just not log; it's not that important to do that
If it is not, drop it and recreate it
e53be77 to
6a388cb
Compare
No description provided.