Subgraphing is a common use case when working with graphs. We often find ourselves wanting to take some small portion of a graph and then operate only upon it. Gremlin provides
subgraph() step, which helps to make this operation relatively easy by exposing a way to produce an edge-induced subgraph that is detached from the parent graph.
subgraph() can come with some limitations. One of the limitations that non-JVM based Gremlin Language Variants have, is the lack of support for
subgraph() step which is discussed on TINKERPOP-2063. The reason why
subgraph() step isn’t supported is because, unlike the JVM which of course has TinkerGraph, there is no
subgraph() step and therefore fail to produce subgaphs even when using JVM-based langauges.
Depending upon the use case and environment, there are a number of workarounds that a GLV user could consider to deal with this issue. One approach, assuming your graph supports the step, would be to use a Gremlin script to perform the
subgraph(). Then, in the same script, some options potentially open up in using that resulting
TinkerGraph instance (assuming the server you’re communicating with allows it):
- Write out its data to a
Stringof GraphSON, GraphML, or some other format which you could potentially then process locally in some fashion on the client.
- Execute multiple Gremlin traversals on that subgraph (even mutations) and return the results as though you had queried the main graph.
These workaround of course don’t address scenarios where the graph simply doesn’t support
subgraph() and obviously increases dependence on Gremlin scripts. To see how some alternative approaches might work, consider the following traversal which produces a subgraph of “marko, who he knows over the age of 30, and what software they created”:
If we think about the TinkerGraph a bit in the above example, it is really just a data structure (i.e. a graph data structure) that happens to organize the data we have and allows us to search it in a particular way (i.e. Gremlin). In fact, when we do query it as
g.E() we effectively get a collection of
Edge objects where each holds an outgoing and incoming
Vertex object. If we opted to forgo the ability to use Gremlin and to analyze this data as a graph it really could have been queried without
subgraph() at all:
Note that the data for the subgraph is no longer in graph data structure form, but is just a list of edges objects with their associated vertices. This data, captured by replacing
store() is no less a representation of the same subgraph as the previous example, it just lacks the surrounding TinkerGraph container to allow querying it. Without the TinkerGraph, obviously this representation of the subgraph becomes something that can be returned to a non-JVM based Gremlin Language Variant. Of course, it would now be up to you to work with this raw graph data (i.e. an edge list). Perhaps you could massage the data into a native graph framework, push it to a visualization framework, convert it to GraphML for import to a tool, or whatever else that might make sense for your use case.
It’s worth pointing out that you likely wouldn’t return an actual list of
Edge objects since they will return as references only. Some conversion would typically be necessary unless you only concerned yourself with
label values on those elements:
The use of
store() is a bit of a low-level replacement for
subgraph() which in the latter “upgrades” the data container holding the edges from a
List object to a
In this form of edge list we can see the potential for repetition in the vertex property data (i.e. the “josh” vertex). There are multiple ways in which we might handle this, but one approach would be to simply
store() the edges and vertices independently:
In addition to
store() it may also make sense to try to utilize
path() to extract a subgraph as the elements Gremlin traverses will all be present in the path history. The downside is that you’re left to remove duplicates and filter out path elements which may not be applicable to your subgraph. The nice thing about
path() for subgraphing is that it won’t really pollute your Gremlin traversal in the way that
store() does, as
store() needs to appear after every step where you wish to keep a
Edge for your subgraph.
Line 9 is a bit of a distraction as it presents a type of a hack to split the single list of mixed vertices and edges into separate homogeneous lists of each (but there is no better way to do that with Gremlin at this time - at least until TINKERPOP-2234). After that, the code is almost identical to the approach with
store(), but without having to maintain the insertion of the
store() step everywhere. It’s hard to say which would perform better and I imagine that it would take some testing on specific graph systems to determine which would work best. In any case, these alternatives to
subgraph() should offer some options to those who need this kind of functionality, but are subject to one or more of the limitations that prevent it.