-
Notifications
You must be signed in to change notification settings - Fork 123
Added a regex checker and fixer for offsets without colons for ZDateTime and OSDateTime #208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added a regex checker and fixer for offsets without colons for ZDateTime and OSDateTime #208
Conversation
…ateTime and OffsetDateTime.
Edit: Fixed with ignoring matches of start of the string. |
datetime/src/main/java/com/fasterxml/jackson/datatype/jsr310/deser/InstantDeserializer.java
Outdated
Show resolved
Hide resolved
datetime/src/main/java/com/fasterxml/jackson/datatype/jsr310/deser/InstantDeserializer.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few minor points otherwise looks good to me.
…and recommented on regex expression usage in production.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, @cowtowncoder thoughts?
FYI: I naively thought it would be better to apply the regex to the end of the string, and quickly found out: not so simple. I think given the shortness of the string, and the simplicity (clean and simple) of the regex you have, it should be fine, so please disregard (now deleted) comment I had about the regex. |
Well that explains why the comment button refused to work when I tried to reply! And originally i had the same thought as you. I overdid the regex expression before figuring out simple ought to do the trick. There are also variations of the ISO 8601 string which includes TZ on end as mentioned in #131, such as "2000-01-01T12:00+0100[Europe/Paris]". So any at the end of expression would have to check for it to be the end or followed by "[". |
@oeystein Thanks! Good points, the regex can get crazy quick. I think this is getting ready, though @cowtowncoder might have some further comments, in the meantime, one thing you'll need to send in (if not already) the CLA, from: https://github.com/FasterXML/jackson/blob/master/contributor-agreement.pdf and usually the easiest way is to print, fill & sign, scan/photo, email to info at fasterxml dot com. Thank you once again for the PR! |
@kupci Any time! And i just sent over the contributor agreement so the formalities should be in order now. |
* | ||
* @since 2.13 | ||
*/ | ||
protected static final Pattern ISO8601_COLONLESS_OFFSET_REGEX = Pattern.compile("(?<!^)[+-][0-9]{4}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick question here: I understand that first part is to try to avoid false matches (since we cannot assume it's at the end, due to possible timezone suffix), but what does the part in parenthesis mean? :)
Also wonder if it could/should check trailing word boundary (either end-of-string, or [
) -- that would anchor it correctly, I think.
Just curious, not sure if there are performance implications; I know some matching constructors can be more expensive than others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Been giving the regex expression some thought over the last few days and i agree its not optimal. The parenthesis just simply uses anchors to ignore checking the first letter in the the string.
We can however likewise simply use match on anchor(end of line) or boundary([) to check for line ends or "[" matches. So realistically i don't believe the following example should not have any major performance implications? It's at least the best i appear to come up with, and the one i believe would be the best.
[+-][0-9]{4}(?=[|$)
Another solution is too only look for matches after T(which is pretty much the only guaranteed thing in ISO 8601 DataTime formats), which should also technically work but not created a regex expression for this yet.
@kupci I believe you also looked a bit into the performance complications of this, any comments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks like the best to me. Anchoring it to the end of the string, or the bracket, will help performance a bit, compared to the original. Then, the regex engine only looks for the 4 digits, and stops.
Btw, for a fun read on how things can rapidly get out of hand with a regex that looks simple in appearance, this is an interesting read: Runaway Regular Expressions: Catastrophic Backtracking
@oeystein Thank you for contributing this (including tests!) & sending CLA -- only have one question, hoping to merge this soon. |
I posted a code comment on this a couple days ago. When that idea is sorted out everything should be ready to-go. Seems the comment was left hanging! Apologizes not that familiar with github ... i apparently had to submit review for that comment to go trough! |
…e string checked the whole string beyond the start, now it checks at the end or before [.
Committed the requested changes, so if everything seems good now this should be ready to be merged. |
Sorry, extremely heavy week. This is near the top of the pile (https://github.com/FasterXML/jackson-future-ideas/wiki/Jackson-Work-in-Progress) so while delayed, not forgotten, and I intend to get to it this week, hopefully tomorrow. |
WHY
See issue #131 for details
Colonless offsets for ISO 8601s strings a common way of representing DateTimes with offsets. Unfortunately ZonedDateTime.parse(string) and OffsetDateTime.parse(string) does not support this, breaking jackson compatibility.
HOW
Uses a regex to check for offsets without colons, and if found - adds in a colon to make it compatible with ZonedDateTime.parse() and OffsetDateTime.parse()