Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@lucene.apache.org
Received-SPF: neutral (athena.apache.org: 199.36.142.181 is neither permitted
 nor denied by domain of markrmiller@gmail.com)
Content-Type: text/plain; charset=iso-8859-1
Mime-Version: 1.0 (Apple Message framework v1257)
Subject: Re: Exposing Solr routing to SolrJ client
From: Mark Miller <markrmiller@gmail.com>
In-Reply-To: <4F5DFCA2.5010401@designware.dk>
Date: Mon, 12 Mar 2012 10:24:15 -0400
Content-Transfer-Encoding: quoted-printable
Message-Id: <D19B5A4F-C45E-4FCE-8919-61B8FBF4B1E3@gmail.com>
References: <4F5DC12F.5080007@designware.dk>
 <59CD3C15-B76B-4C16-A270-9CBC68FF8A02@gmail.com>
 <4F5DFCA2.5010401@designware.dk>
To: dev@lucene.apache.org


On Mar 12, 2012, at 9:39 AM, Per Steffensen wrote:

> Mark Miller skrev:
>> Hey Per,
>>=20
>> A couple things:
>>=20
>> 1. Distributed realtime get is coming - I know Yonik was looking at =
this recently but got caught up in some other things.
>>  =20
>>=20
> Fantistic! I believe, if the client becomes "routing aware", it is =
only necessary when you are sending more than one id (using "ids") in =
your realtime-get request, and even then the distribution (to several =
Solr servers and merging of results from those) could happen in the =
client (or not, if you dont think that is appropriate).
>> 2. There is a Solrj client that is aware of the cluster state - its =
called CloudSolrServer. You give it the zookeeper address rather than a =
node's address. Currently it doesn't send directly to the leader, but =
this is planned
> Nice! So you plan to solve the "two hop" problem (as ElasticSearch =
calls it) that I was mentioning! =
http://www.elasticsearch.org/guide/reference/java-api/client.html
>>  - it's a little tricky due to lack of access to the Schema for =
hashing, but likely coming soon - there is a JIRA issue for it. Clients =
in other languages should be able to do the same thing.
>>  =20
>>=20
> But can I do realtime-get from a SolrJ client already, then? You say =
that CloudSolrServer does not go directly to leader yet, and if I am =
correct when I claim that realtime-get (/get) requests are not routed on =
serverside to leader, then I will still not be able to do realtime-get =
using CloudSolrServer. Am I correct that I cant do it yet, even using =
CloudSolrServer?

Right, you can't yet even with CloudSolrServer - but I think it will be =
done soon - certainly before the 4 release anyway.

>=20
> BTW, congratulations and thanks, for the terrific work you guys are =
doing on Solr(Cloud)! Hope to get to contribute "versioning" (for =
optimistic locking) and a "unique key" feature that allows the operation =
to fail if the document already exists (instead of just automatically =
deleting what is already there).
>> - Mark
>>=20
>> On Mar 12, 2012, at 5:26 AM, Per Steffensen wrote:
>>=20
>>  =20
>>=20
>>> Hi
>>>=20
>>> I believe Solr(Cloud) is doing some internal routing of =
update-requests to make sure documents are stored in the correct =
core/shard decided by Solrs internal routing algoritm (I believe it =
basically finds out who is the leader-shard for a given document, using =
shared information in ZK, info about the collection and =
hash(document.id)). All nice and cool.
>>>=20
>>> I also believe realtime-gets are not forwarded internally in Solr =
through this routing algorithm, and that it therefore is "impossible" to =
do realtime-gets from a client, because you dont know which core/shard =
to contact directly, again because you dont know the routing alogrithm. =
If Im wrong, it would be very helpfull with a few directions on how to =
do realtime-gets from a client to a Solr servers system containing many =
shards and collection. If Im right, I think it would be very nice if the =
the routing algorithm was somehow exposed to the client (in code =
reachable from SolrJ) so that you can get to do realtime-gets from a =
SolrJ-based client - if it should be done automatically for you of if =
the client using SolrJ explicitly needs to call some code to get info =
about the core to contact, is not so important for now.
>>>=20
>>> Such a solution would also make it possible to get rid of another =
performance related "problem", that most update-requests has to be =
transported among JVMs twice to reach their destination. First from =
client to some "random" Solr server, and then from this Solr server to =
the Solr server holding the core involved in the update. If routing =
information was available for the client it could make sure to route its =
updates directly to the core (the one currently playing the role as =
leader-shard for the shard to which the routing algorithm maps the =
document) involved in the update.
>>>=20
>>> ElasticSearch has a solution to this problem by the usage of "Node =
Client" (instead of just "Transport Client"), where a node client is =
basically a real node in the system that just doesnt store document, but =
which have all the logic and shared information like e.g. routing =
algorithm available -=20
>>> http://www.elasticsearch.org/guide/reference/java-api/client.html
>>> . It certainly doesnt have to be like that with Solr clients, but it =
would be nice if somehow routing logic where available to the SolrJ so =
that it can send its updates (and realtime-gets) directly to the correct =
destination.
>>>=20
>>> Hope to get some comments on this issue.
>>>=20
>>> Regards, Per Steffensen
>>>=20
>>> =
---------------------------------------------------------------------
>>> To unsubscribe, e-mail:=20
>>> dev-unsubscribe@lucene.apache.org
>>>=20
>>> For additional commands, e-mail:=20
>>> dev-help@lucene.apache.org
>>>=20
>>>=20
>>>    =20
>>>=20
>>=20
>> - Mark Miller
>> lucidimagination.com
>>=20
>>=20
>>=20
>>=20
>>=20
>>=20
>>=20
>>=20
>>=20
>>=20
>>=20
>>=20
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:=20
>> dev-unsubscribe@lucene.apache.org
>>=20
>> For additional commands, e-mail:=20
>> dev-help@lucene.apache.org
>>=20
>>=20
>>=20
>>  =20
>>=20
>=20

- Mark Miller
lucidimagination.com


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org