-
-
Notifications
You must be signed in to change notification settings - Fork 142
Allow exposing CBOR simple values as VALUE_EMBEDDED_OBJECT #590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
f58c19c
69a3099
d62a70b
f02755d
ae2252d
2871cfe
704d7ec
c9cbd52
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -59,7 +59,21 @@ public enum Feature implements FormatFeature | |
* | ||
* @since 2.20 | ||
*/ | ||
HANDLE_UNDEFINED_AS_EMBEDDED_OBJECT(false) | ||
HANDLE_UNDEFINED_AS_EMBEDDED_OBJECT(false), | ||
|
||
|
||
/** | ||
* Feature that determines how a CBOR "simple value" of major type 7 is decoded. | ||
* <p> | ||
* When enabled, the parser returns {@link JsonToken#VALUE_EMBEDDED_OBJECT} with an embedded value | ||
* of type {@link CBORSimpleValue}, allowing the caller to distinguish these values from actual {@link JsonToken#VALUE_NUMBER_INT}. | ||
* <p> | ||
* When disabled (the default, for backwards compatibility), simple values are returned as {@link JsonToken#VALUE_NUMBER_INT}, | ||
* from Jackson 2.10 to 2.19. | ||
* | ||
* @since 2.20 | ||
*/ | ||
HANDLE_SIMPLE_VALUES_AS_EMBEDDED_OBJECT(false) | ||
; | ||
|
||
final boolean _defaultState; | ||
|
@@ -206,7 +220,7 @@ public int getFirstTag() { | |
* @since 2.20 | ||
*/ | ||
protected int _formatFeatures; | ||
|
||
/** | ||
* Codec used for data binding when (if) requested. | ||
*/ | ||
|
@@ -540,6 +554,9 @@ public boolean empty() { | |
protected float _numberFloat; | ||
protected double _numberDouble; | ||
|
||
protected CBORSimpleValue _simpleValue; | ||
|
||
|
||
// And then object types | ||
|
||
protected BigInteger _numberBigInt; | ||
|
@@ -823,9 +840,9 @@ public JsonToken nextToken() throws IOException | |
_skipIncomplete(); | ||
} | ||
_tokenInputTotal = _currInputProcessed + _inputPtr; | ||
// also: clear any data retained so far | ||
_numTypesValid = NR_UNKNOWN; | ||
_binaryValue = null; | ||
|
||
// also: clear any data retained so far. | ||
clearRetainedValues(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are a few other places this must be done (everywhere where There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I thought clearing it once at the beginning of nextToken could be enough, given it's used only once for major 7, and there is no partial assignment (always one byte). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wouldn't want to count on state not getting cleared for But if you are confident after looking, that's fine, just LMK. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. on the side; so far I've been exploring the async impl of CBOR; and I'm almost done with my draft, building almost all major types to explore the spec. In next weeks, I will start work on the actual organized PR, However, I'd like to know if you think there are higher priority issues that the community would like to be fixed. I'm enjoying contributing to the project in my free time, so aligning with priorities will make it even more valuable. After the async parser, I believe looking at Avro issues is worth it, as I think it's the most used (just a guess from the number of issues) Would appreciate your thoughts. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Aha, I got it, and well, I'm not 100% confident about the different combinations that might happen that can lead to exposing stale values, so yeah, better not to count on that. I will commit the change to clear it in places where binary values is cleared There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @iifawzi On async impl: that sounds like something that would be well received so +1 for that. Avro does seem to get more attention that I'd have expected so that's a good thing to follow up on too. And maybe logical type support and whatever "new" (... I haven't been following up spec development) features for CBOR would be good as well. And ideally Proto3 would be supported by protobuf module. There's also question of replacing protoc-parser (reader for schema) that is deprecated with library that is to replace it (by same authors). I haven't had time to follow up on this, I think there's an issue filed to request it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks a lot for the insights @cowtowncoder! I’ll keep those points in mind as I move forward after the async impl |
||
|
||
// First: need to keep track of lengths of defined-length Arrays and | ||
// Objects (to materialize END_ARRAY/END_OBJECT as necessary); | ||
|
@@ -1843,6 +1860,9 @@ public Object getEmbeddedObject() throws IOException | |
_finishToken(); | ||
} | ||
if (_currToken == JsonToken.VALUE_EMBEDDED_OBJECT ) { | ||
if (_simpleValue != null) { | ||
return _simpleValue; | ||
} | ||
return _binaryValue; | ||
} | ||
return null; | ||
|
@@ -3712,9 +3732,10 @@ protected JsonToken _decodeUndefinedValue() { | |
* Helper method that deals with details of decoding unallocated "simple values" | ||
* and exposing them as expected token. | ||
* <p> | ||
* As of Jackson 2.12, simple values are exposed as | ||
* {@link JsonToken#VALUE_NUMBER_INT}s, | ||
* but in later versions this is planned to be changed to separate value type. | ||
* Starting with Jackson 2.20, this behavior can be changed by enabling the | ||
* {@link com.fasterxml.jackson.dataformat.cbor.CBORParser.Feature#HANDLE_SIMPLE_VALUES_AS_EMBEDDED_OBJECT} | ||
* feature, in which case simple values are returned as {@link JsonToken#VALUE_EMBEDDED_OBJECT} with an | ||
* embedded {@link CBORSimpleValue} instance. | ||
* | ||
* @since 2.12 | ||
*/ | ||
|
@@ -3723,27 +3744,37 @@ public JsonToken _decodeSimpleValue(int lowBits, int ch) throws IOException { | |
_invalidToken(ch); | ||
} | ||
if (lowBits < 24) { | ||
_numberInt = lowBits; | ||
if (Feature.HANDLE_SIMPLE_VALUES_AS_EMBEDDED_OBJECT.enabledIn(_formatFeatures)) { | ||
_simpleValue = new CBORSimpleValue(lowBits); | ||
} else { | ||
_numberInt = lowBits; | ||
} | ||
} else { // need another byte | ||
if (_inputPtr >= _inputEnd) { | ||
loadMoreGuaranteed(); | ||
} | ||
_numberInt = _inputBuffer[_inputPtr++] & 0xFF; | ||
|
||
// As per CBOR spec, values below 32 not allowed to avoid | ||
// confusion (as well as guarantee uniqueness of encoding) | ||
if (_numberInt < 32) { | ||
int value = _inputBuffer[_inputPtr++] & 0xFF; | ||
if (value < 32) { | ||
throw _constructError("Invalid second byte for simple value: 0x" | ||
+Integer.toHexString(_numberInt)+" (only values 0x20 - 0xFF allowed)"); | ||
+Integer.toHexString(value)+" (only values 0x20 - 0xFF allowed)"); | ||
} | ||
|
||
if (Feature.HANDLE_SIMPLE_VALUES_AS_EMBEDDED_OBJECT.enabledIn(_formatFeatures)) { | ||
_simpleValue = new CBORSimpleValue(value); | ||
} else { | ||
_numberInt = value; | ||
} | ||
} | ||
|
||
// 25-Nov-2020, tatu: Although ideally we should report these | ||
// as `JsonToken.VALUE_EMBEDDED_OBJECT`, due to late addition | ||
// of handling in 2.12, simple value in 2.12 will be reported | ||
// as simple ints. | ||
if (Feature.HANDLE_SIMPLE_VALUES_AS_EMBEDDED_OBJECT.enabledIn(_formatFeatures)) { | ||
return JsonToken.VALUE_EMBEDDED_OBJECT; | ||
} | ||
|
||
_numTypesValid = NR_INT; | ||
return (JsonToken.VALUE_NUMBER_INT); | ||
return JsonToken.VALUE_NUMBER_INT; | ||
} | ||
|
||
/* | ||
|
@@ -4100,4 +4131,10 @@ private void createChildObjectContext(final int len) throws IOException { | |
_streamReadContext = _streamReadContext.createChildObjectContext(len); | ||
_streamReadConstraints.validateNestingDepth(_streamReadContext.getNestingDepth()); | ||
} | ||
|
||
private void clearRetainedValues() { | ||
_numTypesValid = NR_UNKNOWN; | ||
_binaryValue = null; | ||
_simpleValue = null; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just realized that I'll probably want change naming here slightly; instead of "handle", either "expose" or maybe "read" or "return". Will go ahead and make that change for both new Feature names.
I think I'll call with
READ_
since that is most commonly used by other Jackson components.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it could be better; I'm not good at naming these feature flags, though, so thank you! :-D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np, that's what I am ultimately responsible for :)
(not that it was bad name per se, just trying to keep uniform terminology etc)