lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Per Steffensen <st...@designware.dk>
Subject Exposing Solr routing to SolrJ client
Date Mon, 12 Mar 2012 09:26:07 GMT
Hi

I believe Solr(Cloud) is doing some internal routing of update-requests 
to make sure documents are stored in the correct core/shard decided by 
Solrs internal routing algoritm (I believe it basically finds out who is 
the leader-shard for a given document, using shared information in ZK, 
info about the collection and hash(document.id)). All nice and cool.

I also believe realtime-gets are not forwarded internally in Solr 
through this routing algorithm, and that it therefore is "impossible" to 
do realtime-gets from a client, because you dont know which core/shard 
to contact directly, again because you dont know the routing alogrithm. 
If Im wrong, it would be very helpfull with a few directions on how to 
do realtime-gets from a client to a Solr servers system containing many 
shards and collection. If Im right, I think it would be very nice if the 
the routing algorithm was somehow exposed to the client (in code 
reachable from SolrJ) so that you can get to do realtime-gets from a 
SolrJ-based client - if it should be done automatically for you of if 
the client using SolrJ explicitly needs to call some code to get info 
about the core to contact, is not so important for now.

Such a solution would also make it possible to get rid of another 
performance related "problem", that most update-requests has to be 
transported among JVMs twice to reach their destination. First from 
client to some "random" Solr server, and then from this Solr server to 
the Solr server holding the core involved in the update. If routing 
information was available for the client it could make sure to route its 
updates directly to the core (the one currently playing the role as 
leader-shard for the shard to which the routing algorithm maps the 
document) involved in the update.

ElasticSearch has a solution to this problem by the usage of "Node 
Client" (instead of just "Transport Client"), where a node client is 
basically a real node in the system that just doesnt store document, but 
which have all the logic and shared information like e.g. routing 
algorithm available - 
http://www.elasticsearch.org/guide/reference/java-api/client.html. It 
certainly doesnt have to be like that with Solr clients, but it would be 
nice if somehow routing logic where available to the SolrJ so that it 
can send its updates (and realtime-gets) directly to the correct 
destination.

Hope to get some comments on this issue.

Regards, Per Steffensen

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message