Skip to content

gh-106287: do not write objects after an unmarshalling error #132715

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

duaneg
Copy link

@duaneg duaneg commented Apr 19, 2025

Writing out an object may involve a slot lookup, which is not safe to do with an exception raised. In debug mode an assertion failure will occur if this happens. To avoid assertion failures, and possibly more subtle problems the assertion is there to prevent, skip attempts to write an object after an error.

Note the unmarshal writing code won't raise an exception itself, however the set writing code calls back into the top-level unmarshal function (see gh-78903), which will check for failures and raises an exception. Once that happens it is not safe to continue writing further objects.

Note also that some data may still be written even after an error occurred and an exception has been raised: code objects write objects and longs, and the latter will be written unconditionally, even if the former fails. This shouldn't cause a problem as it is documented that "garbage data will also be written to the file" in such cases, and the top-level function will handle and report the error appropriately.

Add a test case to cover the various different object types which could trigger such failures.

Note that, contrary to its documentation, the PyMarshal_WriteObjectToFile function does not set an error on failure. Fortunately it is not used internally, except in test code, where it is only used with input that should not fail. I noticed this while checking the code and added a comment, but fixing it should be done separately. If we can be bothered, that is: Guido seemed to notice the same thing and mentioned it in a comment 25 years ago, and no-one seems to have noticed since.

…ling error

Writing out an object may involve a slot lookup, which is not safe to do with
an exception raised. In debug mode an assertion failure will occur if this
happens. To avoid assertion failures, and possibly more subtle problems the
assertion is there to prevent, skip attempts to write an object after an error.

Note the unmarshal writing code won't raise an exception itself, however the
set writing code calls back into the top-level unmarshal function (see
pythongh-78903), which will check for failures and raises an exception. Once that
happens it is not safe to continue writing further objects.

Note also that some data may still be written even after an error occurred and
an exception has been raised: code objects write objects and longs, and the
latter will be written unconditionally, even if the former fails. This
shouldn't cause a problem as it is documented that "garbage data will also be
written to the file" in such cases, and the top-level function will handle and
report the error appropriately.

Add a test case to cover the various different object types which could trigger
such failures.
@bedevere-app
Copy link

bedevere-app bot commented Apr 19, 2025

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

Python/marshal.c Outdated
@@ -750,6 +753,10 @@ PyMarshal_WriteLongToFile(long x, FILE *fp, int version)
w_flush(&wf);
}

/* TODO: Contrary to its documentation, this function does NOT set an error on
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does. PySys_Audit or w_init_refs may set an error on failure. The docs say that exceptions are due to marshalling itself.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, those functions set errors as appropriate, apologies for being unclear. Various other error conditions are correctly reported, too. However unmarshallable objects like in this issue are not. AIUI they should raise a ValueError: unmarshallable object (as PyMarshal_WriteObjectToString does), but the error is silently ignored by PyMarshal_WriteObjectToFile.

The problem is the errors are recorded in WFILE::error (e.g. see the end of w_complex_object) and it is only in _PyMarshal_WriteObjectToString that is converted into an exception. This doesn't happen in PyMarshal_WriteObjectToFile.

This can be reproduced with python -c "import _testcapi; _testcapi.pymarshal_write_object_to_file(int, 'test.data', 5)" (with assertions enabled). It asserts there are no exceptions raised after calling the function, but completes successfully. If you replace int with {int} it will abort as expected.

I can reword or just drop the comment, whatever you prefer. Maybe something like, "TODO: this function silently ignores some failures due to unmarshallable objects"? I'd be happy to just fix it, for that matter, but I wasn't sure whether it would be considered worth the churn.

Now I look at that code again, I note the paths that currently do raise exceptions while writing objects are going to have them overwritten by that error handling code anyway. That's a shame, since they give specifics like "bad marshal data (unnormalized long data)" instead of a generic "unmarshallable object" message. Perhaps we shouldn't raise an exception if there's already one raised?

@duaneg
Copy link
Author

duaneg commented Apr 20, 2025

Thanks for the review! I've just dropped the comment for now: it probably isn't worth worrying about, and if it is, well, I'll just fix it instead 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants