Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 60259 invoked from network); 15 Mar 2010 16:46:09 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 15 Mar 2010 16:46:09 -0000 Received: (qmail 8202 invoked by uid 500); 15 Mar 2010 16:45:20 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 8172 invoked by uid 500); 15 Mar 2010 16:45:20 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 8164 invoked by uid 99); 15 Mar 2010 16:45:20 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 15 Mar 2010 16:45:20 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=10.0 tests=AWL,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of erickerickson@gmail.com designates 209.85.216.187 as permitted sender) Received: from [209.85.216.187] (HELO mail-px0-f187.google.com) (209.85.216.187) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 15 Mar 2010 16:45:16 +0000 Received: by pxi17 with SMTP id 17so2114910pxi.5 for ; Mon, 15 Mar 2010 09:44:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=AuV41FUaEwwIy/kR62TvHFX16zKbe5CFjnxMAGlRaoo=; b=pxVppqr+ldN6QiMvcVWB+Lr6oC4nFB3N4igMugmuJauc4ueZeV1KdZz3dQXjWbNn1T 5KCEwXUuXikkPOWEWXq5J9Rs0SI2tsVVxfwYUDKdAeR371CaTXYTs2rPBdJlbxqr2AFn LZVRUrz4ofSuLBNF9wlR57SaJopZZUDG95yao= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=VLJM+7UrYFrPWcjPpDyYZvtD8Iblj7/59gKlsDi50V5maSpsOBRfK0kAGz5VcBZSbf V06URlj9DT1rgdOL1lNkkR4ZFC2rSy7Oq3LCIq1Bkhiy6Ea2JLoqQed0kRmPAkxEY8o1 p1WSTGf8zAShZNZpogXnY/C6mTRahcmUwT+KM= MIME-Version: 1.0 Received: by 10.115.36.31 with SMTP id o31mr5186611waj.79.1268671493669; Mon, 15 Mar 2010 09:44:53 -0700 (PDT) In-Reply-To: <002052E02A48964A8035D9B6E8A1647DCF29DA@0015-its-exmb01.us.saic.com> References: <002052E02A48964A8035D9B6E8A1647DCF2994@0015-its-exmb01.us.saic.com> <4B9E4894.80204@gmail.com> <002052E02A48964A8035D9B6E8A1647DCF29DA@0015-its-exmb01.us.saic.com> Date: Mon, 15 Mar 2010 12:44:53 -0400 Message-ID: <359a92831003150944x72e5454etda8e57bd22e9bb6c@mail.gmail.com> Subject: Re: Batch Indexing - best practice? From: Erick Erickson To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=0016e64c3ba2b7861c0481d99930 --0016e64c3ba2b7861c0481d99930 Content-Type: text/plain; charset=ISO-8859-1 What's a document? What's indexing? Here's what I'd do as a very first step. Time the actual indexing and report it out. By that I mean how long does IndexWriter.addDocument() take? If you actually get the document from wherever first then add all the fields and add the document, I'd time adding the fields too. The point is to separate the Lucene stuff from whatever else you do before trying to fix anything. The first point of the link Ian provided has the easily-overlooked phrase "and the slowness is indeed inside Lucene"... Best Erick On Mon, Mar 15, 2010 at 11:02 AM, Murdoch, Paul wrote: > Thanks. I'll try lowering the merge factor and see if speed increases. > The indexing is threaded....similar to the utility class in Listing 10.1 > from Lucene in Action. Search speed is great once the index is > built....close to real time. So my main problem is getting the indexing > speed fixed. I do use the StandardAnalyzer for most of my fields. What > type of performance level should I be trying to hit for indexing > (docs/sec)...just to give me an idea of what to shoot for? > > Paul > > -----Original Message----- > From: java-user-return-45433-PAUL.B.MURDOCH=saic.com@lucene.apache.org > [mailto:java-user-return-45433-PAUL.B.MURDOCH=saic.com@lucene.apache.org > ] On Behalf Of Mark Miller > Sent: Monday, March 15, 2010 10:48 AM > To: java-user@lucene.apache.org > Subject: Re: Batch Indexing - best practice? > > On 03/15/2010 10:41 AM, Murdoch, Paul wrote: > > Hi, > > > > > > > > I'm using Lucene 2.9.2. Currently, when creating my index, I'm > calling > > indexWriter.addDocument(doc) for each Document I want to index. The > > Documents aren't large and I'm averaging indexing about 500 documents > > every 90 seconds. I'd like to try and speed this up....unless 90 > > seconds for 500 Documents is reasonable. I have the merge factor set > to > > 1000. Do you have any suggestions for batch indexing? Is there > > something like indexWriter.addDocuments(Document[] docs) in the API? > > > > > > > > Thanks. > > > > Paul > > > > > > > > > > > You should lower that merge factor - thats *really* high. > > You shouldn't really need much more than 50 or so ... and for search > speed your going to want fewer segments anyway - > if your just going to end up optimizing at the end, there is no reason > for such a large merge factor - you will pay for most of what > you saved when you optimize. > > That is very slow by the way. Should be much faster - especially if you > are using multiple threads. > > -- > - Mark > > http://www.lucidimagination.com > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --0016e64c3ba2b7861c0481d99930--