Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 59CA64E3B for ; Thu, 7 Jul 2011 03:50:33 +0000 (UTC) Received: (qmail 89750 invoked by uid 500); 7 Jul 2011 03:50:31 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 88960 invoked by uid 500); 7 Jul 2011 03:50:16 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 88952 invoked by uid 99); 7 Jul 2011 03:50:11 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Jul 2011 03:50:11 +0000 X-ASF-Spam-Status: No, hits=0.0 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [98.139.212.254] (HELO nm15-vm0.bullet.mail.bf1.yahoo.com) (98.139.212.254) by apache.org (qpsmtpd/0.29) with SMTP; Thu, 07 Jul 2011 03:50:02 +0000 Received: from [98.139.212.153] by nm15.bullet.mail.bf1.yahoo.com with NNFMP; 07 Jul 2011 03:49:40 -0000 Received: from [98.139.212.217] by tm10.bullet.mail.bf1.yahoo.com with NNFMP; 07 Jul 2011 03:49:39 -0000 Received: from [127.0.0.1] by omp1026.mail.bf1.yahoo.com with NNFMP; 07 Jul 2011 03:49:39 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 983058.6579.bm@omp1026.mail.bf1.yahoo.com Received: (qmail 55681 invoked by uid 60001); 7 Jul 2011 03:49:39 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1310010579; bh=NLJJdOu9JdI+TsQ0mvSgPqSHE97OvG8ByvCnZia3F8c=; h=X-YMail-OSG:Received:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=xSrlrfgGazL5h7KPWpGmRkzoGKSRd4Oq6DIQDlzJJGRYTKD8zmXXzuvs/t3uq4PoMP6sNkrCqapxYI5G148o4s/b22yPJytOH+c967xzp2el315rzZijFBEQfjdNZVGsif2Xyv1C2sgkvDHtae1Uj8q57/xWjYPUpwXAgsHa/hM= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=JE+/m8vqDy1abc9mx9amIKsqvKSfPQCL7qiG4X0uU/DATutEHAU95DZnVDcr1FD/N7ZtdMML+vEjaM6OMgIoCoHiTR+daKMbdFSKNOJS5lH6um+AtxfW+VPpGmpPb7dCSw5BcDQl5WZLoe1WGgb5O61MujoRFRZxYdSq+g/YH00=; X-YMail-OSG: X6OQRrcVM1lk.7mryhFw36uvfoK1ey8nV9SKG8CE4eniARO x0bvtJfuKLqZscWNIdn3NLkBUyua7q3NAwgV.RHBmcSwbkTXWs.Fv0dwwahG 8CpIG0rqCM7LnTU9TWBjeOFfQ9H14C8gk2jgKDwUHA2wO3aY63C6uiF6ur7w eeJpso1SNJZ72GBFkruKluLqvjHit.93TS6ekddnwrYi.0pGhCQ5ZJnfstKh 81lGqa5NRVBB.z5Tk9Pg8qbGinCWY5Mn7dAUkIGbi4fjdds.FakhLjNNWqO4 d2gSKguckA8cUkUWydTVpRAL97KP6BwEtVPPR2ZfH714MSCmONtS_x2vBYbT p6yLwI7hhk3OMr88MNrjRAwxoteJ2DXDN2g5Cx2Fvw_qtlBVbQ_SKpirxspA JcjhWn.4cgfXKSYHi7ueNmyBa5kIilIaY8kbDXejqi_vkoZxwYklsban_LEp kq7k0LOvb8go8R3On53olzBHzvM4iNlxEoodyVogf2bL5K6oxD0XCmfHMJaX kXpADllW7bQ-- Received: from [74.73.25.254] by web130101.mail.mud.yahoo.com via HTTP; Wed, 06 Jul 2011 20:49:39 PDT X-Mailer: YahooMailWebService/0.8.112.307740 References: Message-ID: <1310010579.54254.YahooMailNeo@web130101.mail.mud.yahoo.com> Date: Wed, 6 Jul 2011 20:49:39 -0700 (PDT) From: Otis Gospodnetic Reply-To: Otis Gospodnetic Subject: Re: distributing the indexing process To: "java-user@lucene.apache.org" In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org We've used Hadoop MapReduce with Solr to parallelize indexing for a custome= r and that brought down their multi-hour indexing process down to a couple = of minutes.=A0 There is/was also Lucene-level contrib in Hadoop that makes = use of MapReduce to parallelize indexing.=0A=0AOtis=0A=0A----=0ASematext ::= http://sematext.com/ :: Solr - Lucene - Nutch=0ALucene ecosystem search ::= http://search-lucene.com/=0A=0A=0A----- Original Message -----=0A> From: G= uru Chandar =0A> To: java-user@lucene.apache.org= =0A> Cc: =0A> Sent: Thursday, June 30, 2011 5:12 AM=0A> Subject: distributi= ng the indexing process=0A> =0A> =0A> =0A> If we have to index a lot of doc= uments, is there a way to divide the=0A> documents into multiple sets and i= ndex them on multiple machines in=0A> parallel, and then merge the resultin= g indexes back into a single=0A> machine? If yes, will the result be logica= lly equivalent to indexing all=0A> the documents on a single machine?=0A> = =0A> =0A> =0A> Thanks,=0A> =0A> -gc=0A> --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org