Skip to content

Data loss if a lot of data is inserted simultaneously #44

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
megakoresh opened this issue Feb 6, 2025 · 2 comments
Open

Data loss if a lot of data is inserted simultaneously #44

megakoresh opened this issue Feb 6, 2025 · 2 comments

Comments

@megakoresh
Copy link

megakoresh commented Feb 6, 2025

Hello, the sender is not threadsafe it seems. When attempting to send a lot of data to the table, large swaths of data is randomly lost. I suspect the reason is that the sender position is reset in the compact function without taking care of the order of execution of flush. So when flush calls are scheduled to event loop and data from flush 1 finishes before flush 2 starts, the buffer will be reset and the data pending for flush 2 will be lost.

With autoflushing this issue is basically unavoidable because the pending row counter is incremented in the "at" function and that is the same function that checks and schedules the flush, while the reset of this pending rows counter happens in compact function, which only runs after the data has been submitted. So if you are sending a lot of data in a large burst via many calls to "at", it's unavoidable to have multiple flush calls scheduled on the loop.

IMO it would be most sensible to ensure the order of execution of flush calls on the library itself. For example by adding the flush calls to an in-mem promise que instead of just "awaiting" on them like now, which will not ensure the order of execution of compact calls.

@alzalabany
Copy link

same issue here, in nodejs lots of ERR_SOCKET_CLOSED_BEFORE_CONNECTION also happen

@semoal
Copy link

semoal commented May 9, 2025

If you can give it a try to the version I've refactored would be nice. We had similar problems and we've been using this version for a while and is quite more stable.
Is quite easy to test it:
#42, clone the pull request in your machine,pnpm run build, and pnpm run link, this will generate a 4.0.0 version in your machine you can install in your project and test it.

I've added a test to replicate the issue you describe and the test passes in my branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants