Skip to content

bug(server): fix lns mismatch in replication #4967

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
May 5, 2025
Merged

bug(server): fix lns mismatch in replication #4967

merged 8 commits into from
May 5, 2025

Conversation

adiholden
Copy link
Collaborator

The bug:
entries were expired at the time of transitioning from full sync to stable sync leading to journal changes not propagated to replica.
The fix:
disable expire items on transitioning from full sync to stable sync using

@adiholden adiholden requested review from romange and kostasrim April 20, 2025 20:57
@adiholden adiholden changed the title bug server: fix lns mismatch in replication bug(server): fix lns mismatch in replication Apr 20, 2025
romange
romange previously approved these changes Apr 21, 2025
Copy link
Collaborator

@romange romange left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@adiholden
Copy link
Collaborator Author

LGTM!

@romange I still see failure in the lsn in CI with this fix. I will continue investigate this

Signed-off-by: adi_holden <adi@dragonflydb.io>
Signed-off-by: adi_holden <adi@dragonflydb.io>
Signed-off-by: adi_holden <adi@dragonflydb.io>
@adiholden
Copy link
Collaborator Author

I suspect the problem relies in JournalStreamer register callback
we have this condition
if (allow_await) {
ThrottleIfNeeded();
...
}

meaning that from a transaction flow if we need to throttle we preempt before writing the data to socket
but than if by the time we preempt there is heatbeat flow execution which does not allows await meaning that it will skip this throttle and will write the next journal entry to socket before the one that run before with lower lsn.
So by skipping the await here in the heatbeat (or any flow that sets the allow await to false) flow we do not ensure sequential order.
@romange
To fix this we can call the throttle if needed after we write to the buffer wdyt?

@romange
Copy link
Collaborator

romange commented Apr 21, 2025

Lets take it offline - I need to understand more

Signed-off-by: adi_holden <adi@dragonflydb.io>
@adiholden
Copy link
Collaborator Author

I got the check fail on master for the lsn order validation. This proves the theory fyi @romange

Signed-off-by: adi_holden <adi@dragonflydb.io>
@adiholden
Copy link
Collaborator Author

Test failed again. I suspect that moving the throttle to after the write did not fix the issue because we can have few registered callbacks and throttle on the first one can result in journal reorder.
Will need to think on another way to fix this

@romange
Copy link
Collaborator

romange commented Apr 23, 2025

What if we do not call ThrottleIfNeeded there? what if instead of storing callbacks in change_cb_arr_, we will store pointers to JournalConsumer interfaces that will have:

void Handle(const JournalItem& item);   
void ThrottleIfNeeded();

And then JournalSlice::CallOnChange will first go over all the instances and call Handle, and then in a separate loop will call ThrottleIfNeeded().
In other words the first loop will be fully atomic and non-preemptable and the second one is about throttling/flushing

@@ -149,11 +149,19 @@ cv_status Transaction::BatonBarrier::Wait(time_point tp) {

Transaction::Guard::Guard(Transaction* tx) : tx(tx) {
DCHECK(tx->cid_->opt_mask() & CO::GLOBAL_TRANS);
tx->Execute([](auto*, auto*) { return OpStatus::OK; }, false);
auto cb = [&](Transaction* t, EngineShard* shard) {
namespaces->GetDefaultNamespace().GetDbSlice(shard->shard_id()).SetExpireAllowed(false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about eviction?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we check the same flag in eviction flow FreeMemWithEvictionStep checks expire_allowed_

Signed-off-by: adi_holden <adi@dragonflydb.io>
@@ -157,28 +157,35 @@ void JournalSlice::CallOnChange(const JournalItem& item) {
// CallOnChange is atomic iff JournalSlice::SetFlushMode(false) is called before.
std::shared_lock lk(cb_mu_);

const size_t size = change_cb_arr_.size();
auto k_v = change_cb_arr_.begin();
const size_t size = journal_consumers_arr_.size();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the first loop is atomic -let's just iterate over map.

++k_v;
}
k_v = journal_consumers_arr_.begin();
if (enable_journal_flush_) {
for (size_t i = 0; i < size; ++i) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move size initialization here and add a comment that you set size because ThrottleIfNeeded can preempt and meanwhile journal_consumers_arr_ can grow. and btw, calling ThrottleIfNeeded on a new consumer is harmless so now you can remove size workaround at all, imho

Copy link
Collaborator

@romange romange left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

adiholden added 2 commits May 5, 2025 12:00
Signed-off-by: adi_holden <adi@dragonflydb.io>
@adiholden adiholden requested a review from romange May 5, 2025 09:56
@adiholden adiholden enabled auto-merge (squash) May 5, 2025 10:44
@adiholden adiholden merged commit 6a84ad0 into main May 5, 2025
10 checks passed
@adiholden adiholden deleted the fix_lsn_bug branch May 5, 2025 10:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants