Skip to content

Incorrectly parsing XML with duplicated tag names #11

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nurkiewicz opened this issue Jun 8, 2011 · 4 comments
Closed

Incorrectly parsing XML with duplicated tag names #11

nurkiewicz opened this issue Jun 8, 2011 · 4 comments

Comments

@nurkiewicz
Copy link
Contributor

Trying to parse the following XML document:

<data>
  <r><a>A</a></r>
  <r><b>B</b><c>C</c></r>
</data>

with:

new XmlMapper().readValue(xml, Map.class)

ignores the first "r" (r -> {a -> A}) node, overriding it with a second one (r -> {b -> B, c -> C}). It should generate a map with a single key and array value instead: r -> [{a -> A}, {b -> B, c -> C}]. The problem is here (last line of org.codehaus.jackson.map.deser.MapDeserializer#_readAndBind):

            /* !!! 23-Dec-2008, tatu: should there be an option to verify
             *   that there are no duplicate field names? (and/or what
             *   to do, keep-first or keep-last)
             */
            result.put(key, value);

Although this can be worked around by using special map implementation instead of Map.class, but if the duplicated tags appear deeper in XML document (not at top level), there is no easy workaround, see org.codehaus.jackson.map.deser.UntypedObjectDeserializer#mapObject class (LinkedHashMap creation).

Of course the root cause of this problem is the assumption that there are no duplicate properties in JSON. In XML such nodes should be treated as arrays.

@cowtowncoder
Copy link
Member

Correct, this problem does result from impedance between XML and JSON.
Another question is whether wrappers for elements were supported: I think structure that works is one without elements; and there is a known issue wrt handling of wrapper vs unwrapped lists.

But Map is pretty specific type, so I wonder if it might be possible to add bit more interaction to make it work.
The way Lists are handled does in fact rely on a somewhat specific low-level method, so it might be possible to add something similar for cases where Map-type content is expected.

@cowtowncoder
Copy link
Member

I am not sure there is generic solution to this problem: your solution assumes that we can use heuristic to combine sub-trees, but this would not be guaranteed for all kinds of structures.

But it could work for some subset of cases; so question then is whether to try to work on something that would work with the standard Map deserializer (which is not format specific), or to add XML-specific Map deserializer.
Latter might make more sense, given that this is "impossible" case for JSON (and in fact I would argue should probably throw an exception so that users do not rely on being able to handle duplicates).

As to XML: there is the immediate problem wherein value of duplicate property may well be something other than another Map; so it is not clear what would be the proper way to merge things. For example:

<data>
  <r>A</r>
  <r><b>B</b><c>C</c></r>
</data>

would not quite work, as value for entry "r" would be String "A". So what should be done for the following entry?

@cowtowncoder
Copy link
Member

I think the answer here is "works as designed" -- 'untyped' binding to Maps and Lists will not be working correctly without assuming more advanced rules, and I don't want to move to that direction.
Will close the issue as 'wont fix'.

@arakelian
Copy link

For people who stumble upon this issue, see the Gist provided in #205 for a work around.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants