Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8DC189ACC for ; Mon, 12 Mar 2012 14:24:44 +0000 (UTC) Received: (qmail 85139 invoked by uid 500); 12 Mar 2012 14:24:43 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 85095 invoked by uid 500); 12 Mar 2012 14:24:43 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 85087 invoked by uid 99); 12 Mar 2012 14:24:43 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 12 Mar 2012 14:24:43 +0000 X-ASF-Spam-Status: No, hits=0.7 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_HELO_PASS,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: 199.36.142.181 is neither permitted nor denied by domain of markrmiller@gmail.com) Received: from [199.36.142.181] (HELO smtp.01.com) (199.36.142.181) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 12 Mar 2012 14:24:37 +0000 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp-out-1.01.com (Postfix) with ESMTP id 246303F4ED2 for ; Mon, 12 Mar 2012 09:24:17 -0500 (CDT) X-Virus-Scanned: amavisd-new at smtp-out-1.01.com Received: from smtp.01.com ([127.0.0.1]) by localhost (smtp-out-1.01.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id RX-BFpZZ+sHH for ; Mon, 12 Mar 2012 09:24:17 -0500 (CDT) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp-out-1.01.com (Postfix) with ESMTP id 05D3A3F533C for ; Mon, 12 Mar 2012 09:24:17 -0500 (CDT) Received: from [192.168.1.201] (ool-457bed02.dyn.optonline.net [69.123.237.2]) by smtp-out-1.01.com (Postfix) with ESMTPSA id AF8683F4ED2 for ; Mon, 12 Mar 2012 09:24:16 -0500 (CDT) Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Apple Message framework v1257) Subject: Re: Exposing Solr routing to SolrJ client From: Mark Miller In-Reply-To: <4F5DFCA2.5010401@designware.dk> Date: Mon, 12 Mar 2012 10:24:15 -0400 Content-Transfer-Encoding: quoted-printable Message-Id: References: <4F5DC12F.5080007@designware.dk> <59CD3C15-B76B-4C16-A270-9CBC68FF8A02@gmail.com> <4F5DFCA2.5010401@designware.dk> To: dev@lucene.apache.org X-Mailer: Apple Mail (2.1257) X-Virus-Checked: Checked by ClamAV on apache.org On Mar 12, 2012, at 9:39 AM, Per Steffensen wrote: > Mark Miller skrev: >> Hey Per, >>=20 >> A couple things: >>=20 >> 1. Distributed realtime get is coming - I know Yonik was looking at = this recently but got caught up in some other things. >> =20 >>=20 > Fantistic! I believe, if the client becomes "routing aware", it is = only necessary when you are sending more than one id (using "ids") in = your realtime-get request, and even then the distribution (to several = Solr servers and merging of results from those) could happen in the = client (or not, if you dont think that is appropriate). >> 2. There is a Solrj client that is aware of the cluster state - its = called CloudSolrServer. You give it the zookeeper address rather than a = node's address. Currently it doesn't send directly to the leader, but = this is planned > Nice! So you plan to solve the "two hop" problem (as ElasticSearch = calls it) that I was mentioning! = http://www.elasticsearch.org/guide/reference/java-api/client.html >> - it's a little tricky due to lack of access to the Schema for = hashing, but likely coming soon - there is a JIRA issue for it. Clients = in other languages should be able to do the same thing. >> =20 >>=20 > But can I do realtime-get from a SolrJ client already, then? You say = that CloudSolrServer does not go directly to leader yet, and if I am = correct when I claim that realtime-get (/get) requests are not routed on = serverside to leader, then I will still not be able to do realtime-get = using CloudSolrServer. Am I correct that I cant do it yet, even using = CloudSolrServer? Right, you can't yet even with CloudSolrServer - but I think it will be = done soon - certainly before the 4 release anyway. >=20 > BTW, congratulations and thanks, for the terrific work you guys are = doing on Solr(Cloud)! Hope to get to contribute "versioning" (for = optimistic locking) and a "unique key" feature that allows the operation = to fail if the document already exists (instead of just automatically = deleting what is already there). >> - Mark >>=20 >> On Mar 12, 2012, at 5:26 AM, Per Steffensen wrote: >>=20 >> =20 >>=20 >>> Hi >>>=20 >>> I believe Solr(Cloud) is doing some internal routing of = update-requests to make sure documents are stored in the correct = core/shard decided by Solrs internal routing algoritm (I believe it = basically finds out who is the leader-shard for a given document, using = shared information in ZK, info about the collection and = hash(document.id)). All nice and cool. >>>=20 >>> I also believe realtime-gets are not forwarded internally in Solr = through this routing algorithm, and that it therefore is "impossible" to = do realtime-gets from a client, because you dont know which core/shard = to contact directly, again because you dont know the routing alogrithm. = If Im wrong, it would be very helpfull with a few directions on how to = do realtime-gets from a client to a Solr servers system containing many = shards and collection. If Im right, I think it would be very nice if the = the routing algorithm was somehow exposed to the client (in code = reachable from SolrJ) so that you can get to do realtime-gets from a = SolrJ-based client - if it should be done automatically for you of if = the client using SolrJ explicitly needs to call some code to get info = about the core to contact, is not so important for now. >>>=20 >>> Such a solution would also make it possible to get rid of another = performance related "problem", that most update-requests has to be = transported among JVMs twice to reach their destination. First from = client to some "random" Solr server, and then from this Solr server to = the Solr server holding the core involved in the update. If routing = information was available for the client it could make sure to route its = updates directly to the core (the one currently playing the role as = leader-shard for the shard to which the routing algorithm maps the = document) involved in the update. >>>=20 >>> ElasticSearch has a solution to this problem by the usage of "Node = Client" (instead of just "Transport Client"), where a node client is = basically a real node in the system that just doesnt store document, but = which have all the logic and shared information like e.g. routing = algorithm available -=20 >>> http://www.elasticsearch.org/guide/reference/java-api/client.html >>> . It certainly doesnt have to be like that with Solr clients, but it = would be nice if somehow routing logic where available to the SolrJ so = that it can send its updates (and realtime-gets) directly to the correct = destination. >>>=20 >>> Hope to get some comments on this issue. >>>=20 >>> Regards, Per Steffensen >>>=20 >>> = --------------------------------------------------------------------- >>> To unsubscribe, e-mail:=20 >>> dev-unsubscribe@lucene.apache.org >>>=20 >>> For additional commands, e-mail:=20 >>> dev-help@lucene.apache.org >>>=20 >>>=20 >>> =20 >>>=20 >>=20 >> - Mark Miller >> lucidimagination.com >>=20 >>=20 >>=20 >>=20 >>=20 >>=20 >>=20 >>=20 >>=20 >>=20 >>=20 >>=20 >> --------------------------------------------------------------------- >> To unsubscribe, e-mail:=20 >> dev-unsubscribe@lucene.apache.org >>=20 >> For additional commands, e-mail:=20 >> dev-help@lucene.apache.org >>=20 >>=20 >>=20 >> =20 >>=20 >=20 - Mark Miller lucidimagination.com --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org