Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 88AD666EF for ; Thu, 30 Jun 2011 09:35:12 +0000 (UTC) Received: (qmail 59606 invoked by uid 500); 30 Jun 2011 09:35:03 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 57324 invoked by uid 500); 30 Jun 2011 09:34:53 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 57284 invoked by uid 99); 30 Jun 2011 09:34:51 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 30 Jun 2011 09:34:51 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of torindan@gmail.com designates 209.85.216.48 as permitted sender) Received: from [209.85.216.48] (HELO mail-qw0-f48.google.com) (209.85.216.48) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 30 Jun 2011 09:34:43 +0000 Received: by qwj9 with SMTP id 9so1475113qwj.35 for ; Thu, 30 Jun 2011 02:34:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=9P8xMTHzVl7W8wHMHhzH8yTkojgE5ZHU91F1T3+BfJw=; b=mgb0T91GsVMS4P6UjyDWi8Tam+YpeHknsLg0zpV/jojEnmIIDH3qgg8lLSzqDXRxT5 41ABEUm2G3ECfhQK+6ZbCdsD5HroIR8FZlI2w6QyWrudIPoiV1FPlkbbevuNDohF5a+Y yVN/xrK/q+GsVoS+VKQq9Z869ByUHgu53qql0= Received: by 10.229.189.8 with SMTP id dc8mr1393992qcb.174.1309426462204; Thu, 30 Jun 2011 02:34:22 -0700 (PDT) MIME-Version: 1.0 Received: by 10.229.182.213 with HTTP; Thu, 30 Jun 2011 02:34:02 -0700 (PDT) In-Reply-To: References: From: =?UTF-8?B?RGFuaWwgxaJPUklO?= Date: Thu, 30 Jun 2011 12:34:02 +0300 Message-ID: Subject: Re: distributing the indexing process To: java-user@lucene.apache.org Content-Type: text/plain; charset=UTF-8 X-Virus-Checked: Checked by ClamAV on apache.org It depends.... If all documents are distinct then, yeah, go for it. If you have multiple versions of same document in your data and you only want to index the latest version...then you need a clever way to split data to make sure that all versions of document will be indexed on same host, and you won't have duplicates later. But my biggest concern is: if your index is that big that you need to index it on different hosts, are you sure you want it to be combine in a single index? Maybe it's a good idea to partition it? On Thu, Jun 30, 2011 at 12:12, Guru Chandar wrote: > > > If we have to index a lot of documents, is there a way to divide the > documents into multiple sets and index them on multiple machines in > parallel, and then merge the resulting indexes back into a single > machine? If yes, will the result be logically equivalent to indexing all > the documents on a single machine? > > > > Thanks, > > -gc > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org