Return-Path: Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: (qmail 49307 invoked from network); 1 Feb 2011 00:48:03 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 1 Feb 2011 00:48:03 -0000 Received: (qmail 18299 invoked by uid 500); 1 Feb 2011 00:48:02 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 18110 invoked by uid 500); 1 Feb 2011 00:48:01 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 18103 invoked by uid 99); 1 Feb 2011 00:48:01 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Feb 2011 00:48:01 +0000 X-ASF-Spam-Status: No, hits=1.8 required=5.0 tests=FREEMAIL_FROM,FREEMAIL_REPLY,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of soheb.lucene@gmail.com designates 74.125.82.176 as permitted sender) Received: from [74.125.82.176] (HELO mail-wy0-f176.google.com) (74.125.82.176) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Feb 2011 00:47:55 +0000 Received: by wye20 with SMTP id 20so6417657wye.35 for ; Mon, 31 Jan 2011 16:47:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:subject:from:to:in-reply-to:references :content-type:date:message-id:mime-version:x-mailer :content-transfer-encoding; bh=iSjy2VuQde993/8FAqIdVFqEM8a6Rnt9tOkLuUWoaWQ=; b=qd6gpu7CJUpCVKIqoaZg5MgEqMxcoEFHbDgy0ieUiw21z1xEZW817g0kD/BZyD5VQN 2n7ZgI6iUJM0OgBiMp1zJtXMIhZZFbov0JmM+WHZk0NB895giVfh/a06xigzkj2yRuC9 JtZ8+Jir9VlQ0+u6c/YlukwRo0EyKQ29mAKpw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=subject:from:to:in-reply-to:references:content-type:date:message-id :mime-version:x-mailer:content-transfer-encoding; b=evTUTuUA7KlsCB2p4y9Lgq/6Xpi5kJjFvfZgzEICvkgawIyxDK+tjPRk+LkoKaTT/x il/CgMxThAuPHm39M+KcFFztXUDT4ZVvOx1lPwzoMJCEfs1UX/tcuVmfqmH3y4s2wx3H 8TyCv6Qj9J1gx/PuGsuqll1L26WqgjHJLctTs= Received: by 10.227.182.68 with SMTP id cb4mr1111590wbb.218.1296521253459; Mon, 31 Jan 2011 16:47:33 -0800 (PST) Received: from [192.168.1.6] (cpc4-craw1-0-0-cust54.croy.cable.virginmedia.com [82.35.183.55]) by mx.google.com with ESMTPS id f35sm15620745wbf.14.2011.01.31.16.47.31 (version=SSLv3 cipher=RC4-MD5); Mon, 31 Jan 2011 16:47:32 -0800 (PST) Subject: Re: Distributed Indexing From: Soheb Mahmood To: dev@lucene.apache.org In-Reply-To: <1296345418.18365.1417931133@webmail.messagingengine.com> References: <1296059354.3260.19.camel@soheb-1201N> <1296333246.2931.12.camel@soheb-1201N> <1296345418.18365.1417931133@webmail.messagingengine.com> Content-Type: text/plain; charset="UTF-8" Date: Tue, 01 Feb 2011 00:47:31 +0000 Message-ID: <1296521251.8344.1.camel@soheb-1201N> Mime-Version: 1.0 X-Mailer: Evolution 2.30.3 Content-Transfer-Encoding: 7bit (I'm sending this on behalf of William, a guy on our team working on ShardDistributedPolicy): Hi Guys I've had a go at creating the ShardDistributionPolicy interface and a few implementations. I've created a patch (https://issues.apache.org/jira/browse/SOLR-2341) let me know what needs doing. Currently I assume that the documents passed to the policy will be represented by some kind of identifier and that one needs only to match the ID with a shard. This is better (I think) than reading the document from the POST and figuring out some kind of unique identifier? A question we've had about this is who decides what policy to use and where do they specify? I'm inclided to think that the user (the person POSTing data) does not mind what policy is used but the administrator might. This leads me to think that the policy should be set in the solr config file? My collegues disagree that the user will not mind and would rather see the policy be specified in the url. We've noticed that request handlers can be specified in both so should we adopt this idea instead (and as a kind of comprimise :) ). All the best William > On Sat, Jan 29, 2011 at 11:56 PM, Upayavira wrote: > > Lance, > > > > Firstly, we're proposing a ShardDistributionPolicy interface for > which > > there is a default (mod of the doc ID) but other implementations are > > possible. Another easy implementation would be a randomised or round > > robin one. > > > > As to threading, the first task would be to put all of the source > > documents into "buckets", one bucket per shard, using the above > > ShardDistributionPolicy to assign documents to buckets/shards. Then > all > > of the documents in a "bucket" could be sent to the relevant shard > for > > indexing (which would be nothing more than a normal HTTP post (or > solrj > > call?)). > > > > As to whether this would be single threaded or multithreaded, I > would > > guess we would aim to do it the same as the distributed search code > > (which I have not yet reviewed). However, it could presumably be > > single-threaded, but use asynchronous HTTP. > > > > Regards, Upayavira > > > > On Sat, 29 Jan 2011 15:09 -0800, "Lance Norskog" > > wrote: > >> I would suggest that a DistributedRequestUpdateHandler run > >> single-threaded, doing only one document at a time. If I want more > >> than one, I run it twice or N times with my own program. > >> > >> Also, this should have a policy object which decides exactly how > >> documents are distributed. There are different techniques for > >> different use cases. > >> > >> Lance > >> > >> On Sat, Jan 29, 2011 at 12:34 PM, Soheb Mahmood > > >> wrote: > >> > Hello Yonik, > >> > > >> > On Thu, 2011-01-27 at 08:01 -0500, Yonik Seeley wrote: > >> >> Making it easy for clients I think is key... one should be able > to > >> >> update any node in the solr cluster and have solr take care of > the > >> >> hard part about updating all relevant shards. This will most > likely > >> >> involve an update processor. This approach allows all existing > update > >> >> methods (including things like CSV file upload) to still work > >> >> correctly. > >> >> > >> >> Also post.jar is really just for testing... a command-line > replacement > >> >> for "curl" for those who may not have it. It's not really a > >> >> recommended way for updating Solr servers in production. > >> > > >> > OK, I've abandoned the post.jar tool idea in favour of a > >> > DistributedUpdateRequestProcessor class (I've been looking into > other > >> > classes like UpdateRequestProcessor, RunUpdateRequestProcessor, > >> > SignatureUpdateProcessorFactory, and SolrQueryRequest to see how > they > >> > are used/what data they store - hence why I've taken some time to > >> > respond). > >> > > >> > My big question now is that is it necessary to have a Factory > class for > >> > DistributedUpdateRequestProcessor? I've seen this lots of times, > as in > >> > RunUpdateProcessorFactory (where the factory class was only a few > lines > >> > of code) to SignatureUpdateProcessorFactory? At first I was > thinking it > >> > would be a good design idea to include it in (in a generic > sense), but > >> > then I thought harder and I thought that the > >> > DistributedUpdateRequestHander would only be running once, taking > in all > >> > the requests, so it seems sort of pointless to write one in. > >> > > >> > That is my "burning" question for now. I have got a few more > questions, > >> > but I'm sure that when I look further into the code, I'll either > have > >> > more or all of my questions are answered. > >> > > >> > Many thanks! > >> > > >> > Soheb Mahmood > >> > > >> > > >> > > --------------------------------------------------------------------- > >> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org > >> > For additional commands, e-mail: dev-help@lucene.apache.org > >> > > >> > > >> > >> > >> > >> -- > >> Lance Norskog > >> goksron@gmail.com > >> > >> > --------------------------------------------------------------------- > >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org > >> For additional commands, e-mail: dev-help@lucene.apache.org > >> > > --- > > Enterprise Search Consultant at Sourcesense UK, > > Making Sense of Open Source > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org > > For additional commands, e-mail: dev-help@lucene.apache.org > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org