lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kevin Watters (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SOLR-11384) add support for distributed graph query
Date Thu, 21 Sep 2017 16:31:01 GMT

     [ https://issues.apache.org/jira/browse/SOLR-11384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Kevin Watters updated SOLR-11384:
---------------------------------
    Priority: Minor  (was: Major)

> add support for distributed graph query
> ---------------------------------------
>
>                 Key: SOLR-11384
>                 URL: https://issues.apache.org/jira/browse/SOLR-11384
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Kevin Watters
>            Priority: Minor
>
> Creating this ticket to track the work that I've done on the distributed graph traversal
support in solr.
> Current GraphQuery will only work on a single core, which introduces some limits on where
it can be used and also complexities if you want to scale it.  I believe there's a strong
desire to support a fully distributed method of doing the Graph Query.  I'm working on a patch,
it's not complete yet, but if anyone would like to have a look at the approach and implementation,
 I welcome much feedback.  
> The flow for the distributed graph query is almost exactly the same as the normal graph
query.  The only difference is how it discovers the "frontier query" at each level of the
traversal.  
> When a distribute graph query request comes in, each shard begins by running the root
query, to know where to start on it's shard.  Each participating shard then discovers it's
edges for the next hop.  Those edges are then broadcast to all other participating shards.
 The shard then receives all the parts of the frontier query , assembles it, and executes
it.
> This process continues on each shard until there are no new edges left, or the maxDepth
of the traversal has finished.
> The approach is to introduce a FrontierBroker that resides as a singleton on each one
of the solr nodes in the cluster.  When a graph query is created, it can do a getInstance()
on it so it can listen on the frontier parts coming in.
> Initially, I was using an external Kafka broker to handle this, and it did work pretty
well.  The new approach is migrating the FrontierBroker to be a request handler in Solr, and
potentially to use the SolrJ client to publish the edges to each node in the cluster.
> There are a few outstanding design questions, first being, how do we know what the list
of shards are that are participating in the current query request?  Is that easy info to get
at?
> Second,  currently, we are serializing a query object between the shards, perhaps we
should consider a slightly different abstraction, and serialize lists of "edge" objects between
the nodes.   The point of this would be to batch the exploration/traversal of current frontier
to help avoid large bursts of memory being required.
> Thrid, what sort of caching strategy should be introduced for the frontier queries, if
any?  And if we do some caching there, how/when should the entries be expired and auto-warmed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message