Skip to content

Add support for collecting multiple deserialization failures during processing, not just first one #1196

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
odrotbohm opened this issue Apr 14, 2016 · 29 comments
Labels
most-wanted Tag to indicate that there is heavy user +1'ing action

Comments

@odrotbohm
Copy link

Currently, deserialization fails as soon as a property of a class can not be deserialized properly. In case a source JSON document contains multiple errors, this will usually cause the sender to be notified of the first error, it will then retry, and run into the second error.

It would be cool if there was a Jackson mode that collects the exceptions that occur on deserialization and report a compound exception eventually to contain all the problems found so that a more exhaustive report about which fields could not be deserialized can be sent to clients.

@cowtowncoder
Copy link
Member

@olivergierke I am not opposed to functionality like this, but it is often difficult to continue in case where problems occur.

One existing facility that can be useful is DeserializationProblemHandler: currently it can only collect problems of "unknown property X" variety, but that is fairly common problem. But if you can suggest other potentially recoverable issues, handler methods could be added.
Using a handler in this way is useful as it allows state of deserializer to stay consistent, which is not guaranteed (or typically something that happens) when an exception is thrown.

@odrotbohm
Copy link
Author

Let me maybe add some context. Having this class:

class Foo {
  Date first, second;
}

and a source JSON of

{ "first" : "foo", "second" : "bar" }

I'd like to be able to tweak the deserialization so that I get it rejected with both errors. If the first one failing the deserialization entirely, I can only report that failure to clients, which it can then fix but will submit an erroneous payload again, which is what I'd like to avoid.

I totally see that the list of exceptions might not be exhaustive as nested errors might not be discovered, but I guess a best effort collecting exceptions could improve the situation. Maybe DeserializationProblemHandler can be extended to inspect a configuration flag, collect up exceptions and eventually throw a compound exception containing the individual ones?

@cowtowncoder
Copy link
Member

@olivergierke I can see usefulness of this, but to be honest I doubt I will have time or interest in working on this for near future. I can help by adding hooks for more lenient handling of specific individual failure modes, but not more involved logic in trying to tie this into whole end-to-end processing. On short term it would be up to someone else to implement handler, do collection, and produce output at a later point. Such handling could possibly be bundled within logical readValue() equivalent, to effectively work the way you describe: caller calls this method, which adds handler, calls mapper's method, and checks if any problems were reported; and if so, throwing something explicitly.

I'll reword the title a bit, and this issue can remain open for anyone with the itch; and/or adding ideas, suggestons.

@cowtowncoder cowtowncoder changed the title Add support for lenient validation on deserialization Add support for collecting multiple deserialization failures during processing, not just first one Apr 14, 2016
@odrotbohm
Copy link
Author

That's great, thanks. Am I right in understanding that there's no way to achieve similar behavior extending one of the SPIs in place? I was kind of wondering whether there was a way to sort of post-process or alter BeanDeserializer instances and add the handling there. Just a wild guess, I am way not as familiar with the internals as you are, which might render my idea a pretty stupid one in the first place ;).

@cowtowncoder
Copy link
Member

Unfortunately there isn't a good place to hook for post-processing; and specifically not from a place that would have access to context (SerializerProvider / DeserializationContext) where per-call attributes live.
I think a general-purpose mechanism for both directions (ser / deser) would make sense, although not 100% sure how the API should work. Unlike with handlers etc, it would seem like a one-off things to pass: that is, separate instance for every call. Due to combinatorial explosion, it would probably be easiest to add methods in ObjectReader / ObjectWriter, where one more thing to chain would not matter too much.

This would just allow accessing whatever has accumulated elsewhere, and that elsewhere would need to be added. But at least that part could stick resulting exceptions in a Collection-valued Attribute (accessed via DeserializationContext, for deserialization), as that lives through the process.

Since I am slowly working on 2.8 items now, such additions would be easy enough to add.

@cowtowncoder
Copy link
Member

One more thing: on your earlier example, yes, I think handling of InvalidFormatException would make lots of sense (or rather, whoever is to send those, should first check for problem handler).
So that could be second "recoverable" type of binding failure to send to handler.

Also, I think it would then make sense to implement standard DeserializationProblemHandler which would do what we explain above, simple collation of recoverable exceptions via specifically named Attribute. Also, come to think of that... perhaps this handler could be notified when the call ends, for simple post-processing. While instance must be stateless, it could access Attribute and see whether anything was collected; and if it was, throw some kind of DeferredBindingException or something (as distinct from individual exception types, but subtype of JsonMappingException).

From that on, maybe it would not be a bad idea to allow addition of such problem instances to non-recoverable exceptions; that is, if something else fails, it would at least indicate existence of earlier problems.

@ghost
Copy link

ghost commented Apr 20, 2016

+1 To @olivergierke's suggestion. We too are trying to build an ergonomic API where validation errors would be captured across all the input and returned together (e.g. bean validation style). I researched into where this functionality could be extended but also didn't see any means to handle the post-hook. In our particular use case, the validation is being performed as a result of custom deserializers, which currently throw exceptions such as IllegalArgumentException and obviously result in a short-circuiting behavior. I look forward to the progress on #1207.

@cowtowncoder
Copy link
Member

Some more thoughts: I think that instead of making DeserializationProblemHandler focal point, maybe it is actually better to catch issue from DeserializationContext methods used to throw exceptions (many only construct them, but should change to, or overload for, ones that construct and by default throw).
And if DeserializationFeature.COLLECT_EXCEPTIONS (or whatever) is enabled, then instead throwing, just collect Exceptions. Could also have some upper bound, and throw if that exceeded.

But besides collection, there's still the question of how to chain exceptions. I guess one possibility would be to retain the first exception and "append" others onto it, throwing original one at the end (or max exceptions exceeded, if checked), but adding accessor for secondary exceptions.
Alternatively could create a new exception type for case of multiples, although not sure if that is better than just augmenting existing. Either way, should probably somehow also indicate something like "[normal message](and N additional fail messages).

@odrotbohm
Copy link
Author

What about the wrapping exception just carrying a value object that basically carries a List of exception so that they can be iterated over but also allows obtain the first exception that maybe caused the others etc. without having to know about the internal ordering semantics in the list?

@cowtowncoder
Copy link
Member

@olivergierke I think the main challenge with wrapper is that many callers may not be aware of possible chaining. So I think I agree in that it should be very similar to exception we would now throw, that is, the first encountered problem. And so a list or chain of references to following exceptions would make sense. Whether that List should contain the first exception is another minor detail.

@cowtowncoder
Copy link
Member

@olivergierke @mhuffman-r7 I think I am almost/mostly done with #1207 work for 2.8, so that all/most existing exception throwing from jackson-databind should either go via:

  1. New handleXxx() methods in DeserializationContext, or
  2. New reportXxx() methods in DeserializationContext

I am open to addition of more semantic handleXxx() methods (and to some degree reportXxx(), although that's mostly fallback).
Datatype modules need retrofitting as well; and there's the question of what to do with low-level decoding exception as well (I will most likely add at least some form of "EofException"), but the full scope is so big that best I can do is to do incremental improvements that should allow more powerful recovery and/or diagnostics.

I don't plan on building anything special within jackson-databind for 2.8 to collect failures or to recover, but I hope someone experiments with handler approach to see how much it can help.

Now: I am sure that there are many gaps (things where handler isn't called where it should be), or incomplete handling (handler gets called, but return value, if no exception is thrown, isn't used appropriately to actually recover). Reporting those, based on experimentation, would be valuable as I should be able to tackle them on case-by-case basis.

@jukmanty
Copy link

jukmanty commented Feb 3, 2017

+1 for this feature request.

In our case there was need to parse array of objects one by one in a subtree of the message, collect error for each array element mapping exception and not to give up on first exception. We wanted deserialization result to contain message body information, immutable list of valid objects and list of collected errors with enough information to trace the object and the field that was unparseable from source message.

I managed to accomplish this by creating specialized ImmutableList deserializer that catched array members' mapping exceptions and added error list as DeserializationContext attribute. But this required quite low-level parsing such as skipping tokens to the start of next element in this specific array (and not in the sub-arrays it contained). And I needed to add another specialized root object deserializer that returned errors from DeserializationContext and deserialized message as one composite object because context is destroyed after deserialization call.

One thing that I was not able to accomplish is to get parsed field values if there was a mapping exception. I needed to get object identifier field value to collected error message if it was parsed. This was not possible because I used constructor creator parameters and buffer was not available through parser or exception (buffer is local variable in BeanDeserializer _deserializeUsingPropertyBased method).

I quess the main point here is that it would be nice if this feature would contain ability to recover from non-fatal errors and mechanism to define to how recover from those (for example: skip to next array element, collect exception, return valid elements and errors).

@cowtowncoder
Copy link
Member

@jukmanty Jackson 2.9 will not directly implement collection of multiple exceptions, but it does add half a dozen extension points within DeserializationContext, as well as related new methods in DeserializationProblemHandler: latter may be used to recover from some of the problems.
And combination could be used to implement support for recovery and/or collection of problems that may be temporarily skipped (that is, to give bogus data instead of throwing exception).

@ngueno
Copy link

ngueno commented May 17, 2019

Hi there @cowtowncoder, do you have any news about newer versions of Jackson supporting this feature?

@cowtowncoder
Copy link
Member

@ngueno No new features have been added to help with this feature request.

I'll add this on https://github.com/FasterXML/jackson-future-ideas/wiki/Jackson-Work-in-Progress just to keep it in mind. No current plans but since it is highly voted I would like to figure out some improvement to help.

@plokhotnyuk
Copy link

@cowtowncoder I would think more before implementing such features... what if some counterparty will be able to send some malicious input that will explode your error accumulator?

@cowtowncoder
Copy link
Member

@plokhotnyuk feature, if any, would be something for users to enable, not enabled by default.
But I think you are right in that there definitely should be an upper limit for amount of information (number of failures to include information for) to include, after which processing would terminate

It is also not clear whether it would actually work well for common use cases because some types of problems (decoding problems like missing closing double-quote) are difficult to recover from anyway.
It might work of limiting to databind-level issues, and in particular those for with DeserializationProblemHandler is called.

@ngueno
Copy link

ngueno commented May 22, 2019

@cowtowncoder probably this needs to be on another request but it does not hurt to ask.

During the customization using DeserializationProblemHandler I usually access the currently field being serialized using JsonParser.getCurrentName(). Unfortunately I noticed that some methods do not provide access to JsonParser (handleWeirdStringValue and handleWeirdNumberValue for example).

For those cases it is not possible to access the field name which currently contains the unparsable value. I was looking for that to customize also our error messages informing the field name and field value presenting issues

Is it possible to provide an interface to fetch the field name beside the converted value for those cases?

@cowtowncoder
Copy link
Member

@ngueno Key is DeserializationContext which has getParser(). That should usually be correctly assigned, although there is potential concern wrt buffered content. But if you find incorrect location/information it can probably be resolved; and parser should never return null.

@ngueno
Copy link

ngueno commented May 22, 2019

@cowtowncoder thank you for your quickly reply!

I was not aware of getting the JsonParser directly from DeserializationContext. Just tested and at the first glance the field is being returned OK using this approach.

If there is some issue in the future related to buffered content I let you know.

@cowtowncoder
Copy link
Member

@ngueno perfect. And please do report problems you find: things get tricky with buffering and for better or worse fixes to problems are often locals.

Also contributions as unit tests would be really useful to guard against regressions: what I have learned over time is that anything not covered by unit tests is much more fragile than functionality that is covered -- latter rarely gets broken, former every now and then.

@ngueno
Copy link

ngueno commented Jun 27, 2019

@cowtowncoder Hello again!

I found one issue using the DeserializationContext.getParser().getCurrentName() inside the handleWeirdStringValue method.
I am trying to fetch the property name being serialized when the value is not parseable to a given enum, but unfortunately the getCurrentName() returns null at this point.

All the other stuff is working well, as you may see below:

targetType -> class com.ngueno.documents.DocumentTypeEnum
failureMsg -> value not one of declared Enum instance names: [DOCUMENT, CONTRACT, PERMISSION]
valueToConvert -> CONTRACTssss
ctxt.getParser().getCurrentName() -> null

Is there any other way to fetch the current name when parsing an enum?

@GedMarc
Copy link

GedMarc commented Jun 27, 2019

Parser.getText() ?

image

@ngueno
Copy link

ngueno commented Jun 27, 2019

@GedMarc thanks for your reply, unfortunately the Parser.getText() will return the same as the valueToConvert instead of the current Json property name being serialized.

@GedMarc
Copy link

GedMarc commented Jun 27, 2019

Thanks for the confirmation -
Let me take a deeper look and see if I can help -
@cowtowncoder is currently taking a break

@cowtowncoder
Copy link
Member

@ngueno I can't say if this is easy to fix or not before looking into it, and do not have time to do that right now. But in general I suspect this is due to content buffering. I would not count on getText() or getCurrentName() working, but, you might have better luck with

JsonParser.getParsingContext()

which keeps track of location within logical input document (there are couple of accessors).

Sometimes

JsonParser.getCurrentValue()

can be useful as well, and that ties back to actual Object hierarchy being constructed. Probably not useful here tho.

@ssathi
Copy link

ssathi commented Jul 24, 2019

+1 for this feature.

I need it as well.

@cowtowncoder cowtowncoder removed the 2.10 label Apr 7, 2020
@tonyfarney
Copy link

It's a interesting feature. I would like to use it right now.

@cowtowncoder cowtowncoder added the most-wanted Tag to indicate that there is heavy user +1'ing action label Oct 4, 2020
@cowtowncoder
Copy link
Member

Ok, enough +1s etc, marking as "most-wanted". Does not have any immediate effect but may help draw attention to anyone interested in working on it -- and I will consider it as input too of course for future minor versions as I agree that it would be good to have some way to gather multiple issues to report, if possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
most-wanted Tag to indicate that there is heavy user +1'ing action
Projects
None yet
Development

No branches or pull requests

8 participants