lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: Exposing Solr routing to SolrJ client
Date Mon, 12 Mar 2012 14:24:15 GMT

On Mar 12, 2012, at 9:39 AM, Per Steffensen wrote:

> Mark Miller skrev:
>> Hey Per,
>> 
>> A couple things:
>> 
>> 1. Distributed realtime get is coming - I know Yonik was looking at this recently
but got caught up in some other things.
>>   
>> 
> Fantistic! I believe, if the client becomes "routing aware", it is only necessary when
you are sending more than one id (using "ids") in your realtime-get request, and even then
the distribution (to several Solr servers and merging of results from those) could happen
in the client (or not, if you dont think that is appropriate).
>> 2. There is a Solrj client that is aware of the cluster state - its called CloudSolrServer.
You give it the zookeeper address rather than a node's address. Currently it doesn't send
directly to the leader, but this is planned
> Nice! So you plan to solve the "two hop" problem (as ElasticSearch calls it) that I was
mentioning! http://www.elasticsearch.org/guide/reference/java-api/client.html
>>  - it's a little tricky due to lack of access to the Schema for hashing, but likely
coming soon - there is a JIRA issue for it. Clients in other languages should be able to do
the same thing.
>>   
>> 
> But can I do realtime-get from a SolrJ client already, then? You say that CloudSolrServer
does not go directly to leader yet, and if I am correct when I claim that realtime-get (/get)
requests are not routed on serverside to leader, then I will still not be able to do realtime-get
using CloudSolrServer. Am I correct that I cant do it yet, even using CloudSolrServer?

Right, you can't yet even with CloudSolrServer - but I think it will be done soon - certainly
before the 4 release anyway.

> 
> BTW, congratulations and thanks, for the terrific work you guys are doing on Solr(Cloud)!
Hope to get to contribute "versioning" (for optimistic locking) and a "unique key" feature
that allows the operation to fail if the document already exists (instead of just automatically
deleting what is already there).
>> - Mark
>> 
>> On Mar 12, 2012, at 5:26 AM, Per Steffensen wrote:
>> 
>>   
>> 
>>> Hi
>>> 
>>> I believe Solr(Cloud) is doing some internal routing of update-requests to make
sure documents are stored in the correct core/shard decided by Solrs internal routing algoritm
(I believe it basically finds out who is the leader-shard for a given document, using shared
information in ZK, info about the collection and hash(document.id)). All nice and cool.
>>> 
>>> I also believe realtime-gets are not forwarded internally in Solr through this
routing algorithm, and that it therefore is "impossible" to do realtime-gets from a client,
because you dont know which core/shard to contact directly, again because you dont know the
routing alogrithm. If Im wrong, it would be very helpfull with a few directions on how to
do realtime-gets from a client to a Solr servers system containing many shards and collection.
If Im right, I think it would be very nice if the the routing algorithm was somehow exposed
to the client (in code reachable from SolrJ) so that you can get to do realtime-gets from
a SolrJ-based client - if it should be done automatically for you of if the client using SolrJ
explicitly needs to call some code to get info about the core to contact, is not so important
for now.
>>> 
>>> Such a solution would also make it possible to get rid of another performance
related "problem", that most update-requests has to be transported among JVMs twice to reach
their destination. First from client to some "random" Solr server, and then from this Solr
server to the Solr server holding the core involved in the update. If routing information
was available for the client it could make sure to route its updates directly to the core
(the one currently playing the role as leader-shard for the shard to which the routing algorithm
maps the document) involved in the update.
>>> 
>>> ElasticSearch has a solution to this problem by the usage of "Node Client" (instead
of just "Transport Client"), where a node client is basically a real node in the system that
just doesnt store document, but which have all the logic and shared information like e.g.
routing algorithm available - 
>>> http://www.elasticsearch.org/guide/reference/java-api/client.html
>>> . It certainly doesnt have to be like that with Solr clients, but it would be
nice if somehow routing logic where available to the SolrJ so that it can send its updates
(and realtime-gets) directly to the correct destination.
>>> 
>>> Hope to get some comments on this issue.
>>> 
>>> Regards, Per Steffensen
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: 
>>> dev-unsubscribe@lucene.apache.org
>>> 
>>> For additional commands, e-mail: 
>>> dev-help@lucene.apache.org
>>> 
>>> 
>>>     
>>> 
>> 
>> - Mark Miller
>> lucidimagination.com
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: 
>> dev-unsubscribe@lucene.apache.org
>> 
>> For additional commands, e-mail: 
>> dev-help@lucene.apache.org
>> 
>> 
>> 
>>   
>> 
> 

- Mark Miller
lucidimagination.com












---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message