dedup() on Path objects

The dedup()
step has some interesting behavior when used on a Path
object and could lead to some unexpected results if not taken into consideration. The following example shows some basic usage:
gremlin> g.V().union(out().path(), out().path())
==>[v[1],v[3]]
==>[v[1],v[2]]
==>[v[1],v[4]]
==>[v[1],v[3]]
==>[v[1],v[2]]
==>[v[1],v[4]]
==>[v[4],v[5]]
==>[v[4],v[3]]
==>[v[4],v[5]]
==>[v[4],v[3]]
==>[v[6],v[3]]
==>[v[6],v[3]]
gremlin> g.V().union(out().path(), out().path()).dedup()
==>[v[1],v[3]]
==>[v[1],v[2]]
==>[v[1],v[4]]
==>[v[4],v[5]]
==>[v[4],v[3]]
==>[v[6],v[3]]
In the prior example, the dedup()
step removes all of the duplicate Path
objects to produce just a unique set. A small change to this traversal however could greatly alter the results for dedup()
. When you label steps in Gremlin with as()
, those labels are referenced in the Path
object and then dedup()
behaves differently as shown in the next example.
gremlin> g.V().union(out().as('x').path(), out().path())
==>[v[1],v[3]]
==>[v[1],v[2]]
==>[v[1],v[4]]
==>[v[1],v[3]]
==>[v[1],v[2]]
==>[v[1],v[4]]
==>[v[4],v[5]]
==>[v[4],v[3]]
==>[v[4],v[5]]
==>[v[4],v[3]]
==>[v[6],v[3]]
==>[v[6],v[3]]
The dedup()
step is doing equality checks on the Path
object which is the Vertex
(in this case) but also the labels. Even though the objects are the same, the Path
objects are technically different. It’s difficult to see that in Gremlin Console, and perhaps other tools because the labels aren’t visible as part of the Path
string representation. They are however accessible on the Path
object itself:
gremlin> path = g.V().union(out().as('x').path(), out().path()).next()
==>v[1]
==>v[3]
gremlin> path.labels()
==>[]
==>[x]
The workaround that will ensure that the dedup()
will not take labels into account is to deconstruct the Path
to a List
with unfold()
which will strip all of the labels:
gremlin> g.V().union(out().as('x').path(), out().path()).map(unfold().fold()).dedup()
==>[v[1],v[3]]
==>[v[1],v[2]]
==>[v[1],v[4]]
==>[v[4],v[5]]
==>[v[4],v[3]]
==>[v[6],v[3]]
When working with Path
objects in Gremlin, it’s important to be aware that dedup()
considers both the objects and any labels attached to the path. This can lead to unexpected results where paths that appear identical in the console are treated as distinct due to differing labels. If the goal is to deduplicate solely on the objects traversed, without regard to labels, deconstruct the Path
to a List
before applying dedup()
. Understanding this nuance can help avoid subtle bugs and ensure your traversals return the results expected.