Skip to content

Allow exposing CBOR simple values as VALUE_EMBEDDED_OBJECT #590

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,21 @@ public enum Feature implements FormatFeature
*
* @since 2.20
*/
HANDLE_UNDEFINED_AS_EMBEDDED_OBJECT(false)
HANDLE_UNDEFINED_AS_EMBEDDED_OBJECT(false),
Copy link
Member

@cowtowncoder cowtowncoder May 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized that I'll probably want change naming here slightly; instead of "handle", either "expose" or maybe "read" or "return". Will go ahead and make that change for both new Feature names.

I think I'll call with READ_ since that is most commonly used by other Jackson components.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could be better; I'm not good at naming these feature flags, though, so thank you! :-D

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np, that's what I am ultimately responsible for :)
(not that it was bad name per se, just trying to keep uniform terminology etc)



/**
* Feature that determines how a CBOR "simple value" of major type 7 is decoded.
* <p>
* When enabled, the parser returns {@link JsonToken#VALUE_EMBEDDED_OBJECT} with an embedded value
* of type {@link CBORSimpleValue}, allowing the caller to distinguish these values from actual {@link JsonToken#VALUE_NUMBER_INT}.
* <p>
* When disabled (the default, for backwards compatibility), simple values are returned as {@link JsonToken#VALUE_NUMBER_INT},
* from Jackson 2.10 to 2.19.
*
* @since 2.20
*/
HANDLE_SIMPLE_VALUES_AS_EMBEDDED_OBJECT(false)
;

final boolean _defaultState;
Expand Down Expand Up @@ -206,7 +220,7 @@ public int getFirstTag() {
* @since 2.20
*/
protected int _formatFeatures;

/**
* Codec used for data binding when (if) requested.
*/
Expand Down Expand Up @@ -540,6 +554,9 @@ public boolean empty() {
protected float _numberFloat;
protected double _numberDouble;

protected CBORSimpleValue _simpleValue;


// And then object types

protected BigInteger _numberBigInt;
Expand Down Expand Up @@ -823,9 +840,9 @@ public JsonToken nextToken() throws IOException
_skipIncomplete();
}
_tokenInputTotal = _currInputProcessed + _inputPtr;
// also: clear any data retained so far
_numTypesValid = NR_UNKNOWN;
_binaryValue = null;

// also: clear any data retained so far.
clearRetainedValues();
Copy link
Member

@cowtowncoder cowtowncoder May 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few other places this must be done (everywhere where _binaryValue = null; is done)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought clearing it once at the beginning of nextToken could be enough, given it's used only once for major 7, and there is no partial assignment (always one byte).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't want to count on state not getting cleared for nextFieldName() variants, as they won't call nextToken().
It may be that a combination of things could keep things consistent (as in, not exposing stale simple value) but when adding new features invariants might not hold.

But if you are confident after looking, that's fine, just LMK.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on the side; so far I've been exploring the async impl of CBOR; and I'm almost done with my draft, building almost all major types to explore the spec. In next weeks, I will start work on the actual organized PR, However, I'd like to know if you think there are higher priority issues that the community would like to be fixed. I'm enjoying contributing to the project in my free time, so aligning with priorities will make it even more valuable.

After the async parser, I believe looking at Avro issues is worth it, as I think it's the most used (just a guess from the number of issues)

Would appreciate your thoughts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't want to count on state not getting cleared for nextFieldName() variants, as they won't call nextToken(). It may be that a combination of things could keep things consistent (as in, not exposing stale simple value) but when adding new features invariants might not hold.

But if you are confident after looking, that's fine, just LMK.

Aha, I got it, and well, I'm not 100% confident about the different combinations that might happen that can lead to exposing stale values, so yeah, better not to count on that. I will commit the change to clear it in places where binary values is cleared

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iifawzi On async impl: that sounds like something that would be well received so +1 for that. Avro does seem to get more attention that I'd have expected so that's a good thing to follow up on too. And maybe logical type support and whatever "new" (... I haven't been following up spec development) features for CBOR would be good as well.
(Avro, too, has logical type concept).

And ideally Proto3 would be supported by protobuf module. There's also question of replacing protoc-parser (reader for schema) that is deprecated with library that is to replace it (by same authors). I haven't had time to follow up on this, I think there's an issue filed to request it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the insights @cowtowncoder! I’ll keep those points in mind as I move forward after the async impl


// First: need to keep track of lengths of defined-length Arrays and
// Objects (to materialize END_ARRAY/END_OBJECT as necessary);
Expand Down Expand Up @@ -1843,6 +1860,9 @@ public Object getEmbeddedObject() throws IOException
_finishToken();
}
if (_currToken == JsonToken.VALUE_EMBEDDED_OBJECT ) {
if (_simpleValue != null) {
return _simpleValue;
}
return _binaryValue;
}
return null;
Expand Down Expand Up @@ -3712,9 +3732,10 @@ protected JsonToken _decodeUndefinedValue() {
* Helper method that deals with details of decoding unallocated "simple values"
* and exposing them as expected token.
* <p>
* As of Jackson 2.12, simple values are exposed as
* {@link JsonToken#VALUE_NUMBER_INT}s,
* but in later versions this is planned to be changed to separate value type.
* Starting with Jackson 2.20, this behavior can be changed by enabling the
* {@link com.fasterxml.jackson.dataformat.cbor.CBORParser.Feature#HANDLE_SIMPLE_VALUES_AS_EMBEDDED_OBJECT}
* feature, in which case simple values are returned as {@link JsonToken#VALUE_EMBEDDED_OBJECT} with an
* embedded {@link CBORSimpleValue} instance.
*
* @since 2.12
*/
Expand All @@ -3723,27 +3744,37 @@ public JsonToken _decodeSimpleValue(int lowBits, int ch) throws IOException {
_invalidToken(ch);
}
if (lowBits < 24) {
_numberInt = lowBits;
if (Feature.HANDLE_SIMPLE_VALUES_AS_EMBEDDED_OBJECT.enabledIn(_formatFeatures)) {
_simpleValue = new CBORSimpleValue(lowBits);
} else {
_numberInt = lowBits;
}
} else { // need another byte
if (_inputPtr >= _inputEnd) {
loadMoreGuaranteed();
}
_numberInt = _inputBuffer[_inputPtr++] & 0xFF;

// As per CBOR spec, values below 32 not allowed to avoid
// confusion (as well as guarantee uniqueness of encoding)
if (_numberInt < 32) {
int value = _inputBuffer[_inputPtr++] & 0xFF;
if (value < 32) {
throw _constructError("Invalid second byte for simple value: 0x"
+Integer.toHexString(_numberInt)+" (only values 0x20 - 0xFF allowed)");
+Integer.toHexString(value)+" (only values 0x20 - 0xFF allowed)");
}

if (Feature.HANDLE_SIMPLE_VALUES_AS_EMBEDDED_OBJECT.enabledIn(_formatFeatures)) {
_simpleValue = new CBORSimpleValue(value);
} else {
_numberInt = value;
}
}

// 25-Nov-2020, tatu: Although ideally we should report these
// as `JsonToken.VALUE_EMBEDDED_OBJECT`, due to late addition
// of handling in 2.12, simple value in 2.12 will be reported
// as simple ints.
if (Feature.HANDLE_SIMPLE_VALUES_AS_EMBEDDED_OBJECT.enabledIn(_formatFeatures)) {
return JsonToken.VALUE_EMBEDDED_OBJECT;
}

_numTypesValid = NR_INT;
return (JsonToken.VALUE_NUMBER_INT);
return JsonToken.VALUE_NUMBER_INT;
}

/*
Expand Down Expand Up @@ -4100,4 +4131,10 @@ private void createChildObjectContext(final int len) throws IOException {
_streamReadContext = _streamReadContext.createChildObjectContext(len);
_streamReadConstraints.validateNestingDepth(_streamReadContext.getNestingDepth());
}

private void clearRetainedValues() {
_numTypesValid = NR_UNKNOWN;
_binaryValue = null;
_simpleValue = null;
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,23 @@ public void testTinySimpleValues() throws Exception
}
}

@Test
public void testTinySimpleValuesAsEmbeddedObjectWhenEnabled() throws Exception
{
CBORFactory f = CBORFactory.builder()
.enable(CBORParser.Feature.HANDLE_SIMPLE_VALUES_AS_EMBEDDED_OBJECT)
.build();
// Values 0..19 are unassigned, valid to encounter
for (int v = 0; v <= 19; ++v) {
byte[] doc = new byte[1];
doc[0] = (byte) (CBORConstants.PREFIX_TYPE_MISC + v);
try (CBORParser p = cborParser(f, doc)) {
assertToken(JsonToken.VALUE_EMBEDDED_OBJECT, p.nextToken());
assertEquals(new CBORSimpleValue(v), p.getEmbeddedObject());
}
}
}

@Test
public void testValidByteLengthMinimalValues() throws Exception {
// Values 32..255 are unassigned, valid to encounter
Expand All @@ -44,6 +61,21 @@ public void testValidByteLengthMinimalValues() throws Exception {
}
}

@Test
public void testValidByteLengthMinimalValuesAsEmbeddedObjectWhenEnabled() throws Exception {
// Values 32..255 are unassigned, valid to encounter
CBORFactory f = CBORFactory.builder()
.enable(CBORParser.Feature.HANDLE_SIMPLE_VALUES_AS_EMBEDDED_OBJECT)
.build();
for (int v = 32; v <= 255; ++v) {
byte[] doc = { (byte) (CBORConstants.PREFIX_TYPE_MISC + 24), (byte) v };
try (CBORParser p = cborParser(f, doc)) {
assertToken(JsonToken.VALUE_EMBEDDED_OBJECT, p.nextToken());
assertEquals(new CBORSimpleValue(v), p.getEmbeddedObject());
}
}
}

@Test
public void testInvalidByteLengthMinimalValues() throws Exception {
// Values 0..31 are invalid for variant that takes 2 bytes...
Expand Down