Gremlin Snippets are typically short and fun dissections of some aspect of the Gremlin language. For a full list of all steps in the Gremlin language see the Reference Documentation of Apache TinkerPop™. This snippet is based on Gremlin 3.7.3.This snippet demonstrates its lesson using the data of the "modern" toy graph (image). Please consider bringing any discussion or questions about this snippet to Discord or the Gremlin Users Mailing List.



The dedup() step has some interesting behavior when used on a Path object and could lead to some unexpected results if not taken into consideration. The following example shows some basic usage:

gremlin> g.V().union(out().path(), out().path())
==>[v[1],v[3]]
==>[v[1],v[2]]
==>[v[1],v[4]]
==>[v[1],v[3]]
==>[v[1],v[2]]
==>[v[1],v[4]]
==>[v[4],v[5]]
==>[v[4],v[3]]
==>[v[4],v[5]]
==>[v[4],v[3]]
==>[v[6],v[3]]
==>[v[6],v[3]]
gremlin> g.V().union(out().path(), out().path()).dedup()
==>[v[1],v[3]]
==>[v[1],v[2]]
==>[v[1],v[4]]
==>[v[4],v[5]]
==>[v[4],v[3]]
==>[v[6],v[3]]

In the prior example, the dedup() step removes all of the duplicate Path objects to produce just a unique set. A small change to this traversal however could greatly alter the results for dedup(). When you label steps in Gremlin with as(), those labels are referenced in the Path object and then dedup() behaves differently as shown in the next example.

gremlin> g.V().union(out().as('x').path(), out().path())
==>[v[1],v[3]]
==>[v[1],v[2]]
==>[v[1],v[4]]
==>[v[1],v[3]]
==>[v[1],v[2]]
==>[v[1],v[4]]
==>[v[4],v[5]]
==>[v[4],v[3]]
==>[v[4],v[5]]
==>[v[4],v[3]]
==>[v[6],v[3]]
==>[v[6],v[3]]

The dedup() step is doing equality checks on the Path object which is the Vertex (in this case) but also the labels. Even though the objects are the same, the Path objects are technically different. It’s difficult to see that in Gremlin Console, and perhaps other tools because the labels aren’t visible as part of the Path string representation. They are however accessible on the Path object itself:

gremlin> path = g.V().union(out().as('x').path(), out().path()).next()
==>v[1]
==>v[3]
gremlin> path.labels()
==>[]
==>[x]

The workaround that will ensure that the dedup() will not take labels into account is to deconstruct the Path to a List with unfold() which will strip all of the labels:

gremlin> g.V().union(out().as('x').path(), out().path()).map(unfold().fold()).dedup()
==>[v[1],v[3]]
==>[v[1],v[2]]
==>[v[1],v[4]]
==>[v[4],v[5]]
==>[v[4],v[3]]
==>[v[6],v[3]]

When working with Path objects in Gremlin, it’s important to be aware that dedup() considers both the objects and any labels attached to the path. This can lead to unexpected results where paths that appear identical in the console are treated as distinct due to differing labels. If the goal is to deduplicate solely on the objects traversed, without regard to labels, deconstruct the Path to a List before applying dedup(). Understanding this nuance can help avoid subtle bugs and ensure your traversals return the results expected.