Skip to content

TINKERPOP-1274: GraphSON 2.0 [revised] #386

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 22, 2016

Conversation

newkek
Copy link
Contributor

@newkek newkek commented Aug 19, 2016

For context, a precise description is provided in the PR for the first version of the fix: #351. Please see this #351 PR first for initial context.

This PR provides the initial set of features defined in #351, plus the following:

  • Types for Graph objects.
  • Types for all numeric values.
  • New type IDs format.
  • Avoid serializing empty properties field.

As a reminder the format for types is the following:

  • A value not typed : value
  • A value typed : {"@type":"typeName", "@value":value}

New type IDs format

A type ID is now composed of 2 parts, the "domain" and the type name. A "domain" can be used by any implementor to implement their own data type, avoiding collisions with the existing TinkerPop type IDs. The default domain for Graph object is "gremlin".

Types for all numeric values

A type is now present for every numeric value, the types have been renamed to be more understandable with regards to their memory sizes or kinds. As a reference, here is a description of all currently existing types and their corresponding Java implementation:

  • Java Integer: "gremlin:int32"
  • Java Long: "gremlin:int64"
  • Java Short: "gremlin:int16"
  • Java Float: "gremlin:float"
  • Java Double: "gremlin:double"

Types for all Graph objects.

New typeIDs introduced in #351 (time types, UUIDs, etc..) now follow the type format defined here: "domain:typename".
Types have now been included for Graph-specific objects, here is an exhaustive list of the existing types handled so far and their corresponding IDs:

  • Vertex -> "gremlin:vertex"
  • Edge -> "gremlin:edge"
  • VertexPropery -> "gremlin:vertexproperty"
  • Property -> "gremlin:property"
  • Path -> "gremlin:path"
  • Tree -> "gremlin:tree"
  • Graph -> "gremlin:graph"
  • Metrics -> "gremlin:metrics"
  • TraversalMetrics -> `gremlin:traversalmetrics"

This improvement defines a requirement to the serialization format which is that every type must have a Jackson serializer and deserializer defined on the source ObjectMapper. Previous not defined serializers and deserializers have been added in this PR.

Code-wise, it's pretty much the same than for #351, the big intake in code here is the addition of the deserializers for all Graph objects, a big simplification to the serializers (GraphSONSerializersV2d0), the addition of the "domain" to the type system, and making that new typeID format configurable to users through the TinkerPopJacksonModule.

mvn clean install test suite passes, and it's rebased on top of current master.

@okram
Copy link
Contributor

okram commented Aug 19, 2016

First off. Thank you for doing this. Here are some notes.

  1. The term is "namespace" not "domain." In case that terminology is used in the documentation and not just in the PR notes.
  2. Why do we have @class and @type. Seems we should just pick one. And if the answer is cause @class is for Java object mapping infrastructure, that isn't good and we will also use GraphSON to create Vertex objects in Python and it might use UUID for its ID as well. Thus, @class=UUID is perhaps @type=UUID. That is, if an object is "typed" its @type and the resolver (for that language) will have to map the correct class.
  3. I read your notes, but its still not clear to me. Why do we have @value? I think we only need @type (or @class) and then the rest of the JSON object data is up to the type serializer/deserializer. For instance, in Bytecode (TINKEROP-1278) we have @type=Lambda and then value, language, arguments. That is, there are 3 "values" we need to make the object and thus @value shouldn't be special with a @. Plus, we save a char.
  4. I think we can save some bytes and still be readable if we make our namespace g instead of gremlin. Moreover, we should be consistent in our naming convention and either use some standard JSON naming or default to Java naming -- e.g. g:vertexProperty. Or better yet, if its NOT namespaced, its assumed to be "Gremlin". That is, TinkerPop owns vertex, vertexProperty, edge. You want to do something else -- its myapp:vertex. This also means we own uuid, int64, etc. etc.

Thanks again for the extensive work.

@newkek
Copy link
Contributor Author

newkek commented Aug 19, 2016

Hey @okram ,

1. The term is "namespace" not "domain." In case that terminology is used in the documentation and not just in the PR notes.

Ok, it's some places here and there in the code, I'll correct it.

2. Why do we have @class and @type.

@class shouldn't be anywhere near GraphSON 2.0 anymore, everything is @type, if you've seen it somewhere, that should be corrected. While writing that I realise I might have forgotten to update the "the-graph.asciidoc" doc... so I'll correct it. But in the actual code it shouldn't be there anymore.

3. I read your notes, but its still not clear to me. Why do we have @value? I think we [...]

So, taking a step back and I'm going to use the exampleof ByteCode. As you can see in the GraphSONSerializersV2d0, serializers don't have to care anymore about types, serializers are put into the context of "You're in a place where you can write whatever you want, type is handled somewhere else for you".
For the ByteCode serializer it means it would open a Map, put a field "language" and its value, put a field "arguments" and its value, put a field "value" (or maybe "steps" then?) and its value, close the map. When deserializing, read the map, get the fields values, create the ByteCode object. (you can see for example the Path serializer that would be a good comparison to the ByteCode serializer, and its deserializer)
The complete payload for a typed ByteCode is then:

{"@type":"gremlin:bytecode", "@value":{"arguments": ..., "language": ....., "steps": ....}}

Let's admit for whatever reason the ByteCode serializer in the future needs to change and instead of writing a Map, it needs to start by an Array where the first element of the array is 'something', the rest is the Map described before, and close the array. Doing this is currently possible with the current format {"@type":"typename", "@value":value}, because the serializer is in a place where it can write whatever it wants to.
Now let's consider another format: {"@type":"typename", value}. There, the serializer is put in the context "You are currently in a Map, whatever you write, it must be in the form of a key/value pair". The "potential future" evolution of the ByteCode serializer (writing an array) wouldn't be possible. To handle that, we would have to change the type format for the ByteCode, and put it as for example ["typename", value] also we would update and tell the serializer "Hey actually now, you're in the context of a List". Then perfect, the serializer can write its List. But what we end up with, is an entity for which the type is ["typename", value] and other for which the type format is {"@type":"typename", value}. We could also let the serializers handle the type format and they decide how to put it, so that's fine, but the problem is deserialization. If there are multiple type formats it will be very hard and in efficient for the type system (i'm talking about GraphSONTypeDeserializer for reference) to determine which type format is used at which moment, and the type system (GraphSONTypeDeserializer) has to find that type because it is the one that reads the types in order to call the deserializers. For the sake of consistency and the problems above I still think we'd better stick to the current format.

4. I think we can save some bytes and still be readable if we make our namespace g

Totally fine with g, again for the sake of consistency, I think it'd be valuable to have a 'name space' all the time.

@newkek
Copy link
Contributor Author

newkek commented Aug 19, 2016

NB: the conflicts for merge are caused by the CHANGELOG

@newkek newkek force-pushed the TINKERPOP-1274-rev branch 2 times, most recently from 225f3e6 to 7e5ff1a Compare August 19, 2016 22:42
@newkek
Copy link
Contributor Author

newkek commented Aug 19, 2016

A remark: @okram requested TraversalExplanation to be included in the list of Graph objects. I've noticed that the current TraversalExplanation Graphson serializer serializes a Traversal. In order to follow the requirement of this GraphSON2.0 type system, we would have to write a deserializer for the TraversalExplanation (there isn't currently). However, the current 1.0 serializer serializes the Traversal as the toString() of the Traversal's step list, which means that it is not possible to deserialize to a Traversal with only such info. So I believe it would be more appropriate to wait for TP1278 before implementing the TraversalExplanation ser/de and leave it out of the scope of this PR.

@spmallette
Copy link
Contributor

I think I agree with @newkek to push that off until we can get his massive PR reviewed/merged. Let's see if we can solve that separately.

@dmill-bz
Copy link
Contributor

I'm liking how this has turned out.
I personally don't have much of an opinion in regards to assuming gremlin is the default namespace and getting rid of g: though it does have the merit of being self documenting. I don't know how much of a size difference this would represent.

@spmallette
Copy link
Contributor

is that an official +1 from you @PommeVerte or are you still reviewing ?

@dmill-bz
Copy link
Contributor

Still reviewing I would like to finish going through the code :) will try and do this later in the day. But looking good so far

@newkek newkek force-pushed the TINKERPOP-1274-rev branch from 7e5ff1a to 115eb3c Compare August 20, 2016 12:53
@newkek
Copy link
Contributor Author

newkek commented Aug 20, 2016

I just pushed corrected docs, and renamed everything "domain" to "namespace". Waiting for more consensus to change the default gremlin namespace.

@spmallette
Copy link
Contributor

All tests pass with docker/build.sh -t -n -i - still going through code though.

@spmallette
Copy link
Contributor

VOTE +1

@dmill-bz
Copy link
Contributor

I'm waiting to hear back from marko on the points he brought up to cast my vote :)

@okram
Copy link
Contributor

okram commented Aug 22, 2016

@newkek I understand. I like the @value concept -- its the JSON sub-object for creating the object (defined by @type).

VOTE +1.

@dmill-bz
Copy link
Contributor

Ok cool if we're all ok with the new features then I'm satisfied. Code looked good as well VOTE: +1

@@ -54,6 +54,14 @@
public DefaultTraversalMetrics() {
}

// This is only a convenient constructor needed for GraphSON deserialization.
// TODO: see if that's ok to add that.
public DefaultTraversalMetrics(long totalStepDurationNs, List<MutableMetrics> metricsMap) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure who I should ask about that, I decided to add a public constructor here for convenience during the deserialization, it takes the total duration, and a list of all the Metrics its composed of.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that - I'll be making some tweaks once we get it over to master. I'll take a look at those odds and ends.

@asfgit asfgit merged commit 115eb3c into apache:master Aug 22, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants