Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 64793 invoked from network); 10 Feb 2004 12:30:07 -0000 Received: from daedalus.apache.org (HELO mail.apache.org) (208.185.179.12) by minotaur-2.apache.org with SMTP; 10 Feb 2004 12:30:07 -0000 Received: (qmail 81666 invoked by uid 500); 10 Feb 2004 12:29:53 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 81632 invoked by uid 500); 10 Feb 2004 12:29:53 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 81600 invoked from network); 10 Feb 2004 12:29:52 -0000 Received: from unknown (HELO hi-net.cz) (62.77.118.60) by daedalus.apache.org with SMTP; 10 Feb 2004 12:29:52 -0000 Received: (qmail 3986 invoked from network); 10 Feb 2004 12:29:51 -0000 Received: from unknown (HELO fw.egothor.org) (62.77.93.14) by mail.hi-net.cz with SMTP; 10 Feb 2004 12:29:51 -0000 Received: from seznam.cz (0-253.shark [192.168.0.253]) by fw.egothor.org (8.12.9/8.12.9) with ESMTP id i1ACThVn003927 for ; Tue, 10 Feb 2004 13:29:44 +0100 (CET) Message-ID: <4028CEBB.1090300@seznam.cz> Date: Tue, 10 Feb 2004 13:29:47 +0100 From: Leo Galambos User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; cs-CZ; rv:1.6) Gecko/20040113 X-Accept-Language: cs, en MIME-Version: 1.0 To: Lucene Users List Subject: Re: Index advice... References: <20040210084739.58412.qmail@web12708.mail.yahoo.com> In-Reply-To: <20040210084739.58412.qmail@web12708.mail.yahoo.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Otis Gospodnetic napsal(a): >Without seeing more information/code, I can't tell which part of your >system slows down with time, but I can tell you that Lucene's 'add' >does not slow over time (i.e. as the index gets larger). Therefore, I >would look elsewhere for causes of the slowdown. > > Otis, can you point me to some proofs that time of "insert" operation does not depend on the index size, please? Amortized time of "insert" is O(log(docsIndexed/mergeFac)), I think. Thus I do not know how it could be O(1). Thank you. Leo AFAIK the issue with PDF files can be based on the PDF parser (I already encountered this with PDFbox). >The easiest thing to do is add logging to suspicious portions of the >code. That will narrow the scope of the code you need to analyze. > >Otis > > >--- kevin@ckhill.com wrote: > > >>Hey Lucene-users, >> >>I'm setting up a Lucene index on 5G of PDF files (full-text search). >>I've >>been really happy with Lucene so far but I'm curious what tips and >>strategies >>I can use to optimize my performance at this large size. >> >>So far I am using pretty much all of the defaults (I'm new to >>Lucene). >> >>I am using PDFBox to add the documents to the index. >>I can usually add about 800 or so PDF files and then the add loop: >> >>for ( int i = 0; i < fileNames.length; i++ ) { >> Document doc = IndexFile.index(baseDirectory+documentRoot+"fileNames >>[i]); >> writer.addDocument(doc); >>} >> >> >>really starts to slow down. Doesn't seem to be memory related. >>Thoughts anyone? >> >>Thanks in advance, >>CK Hill >> >> >> >>--------------------------------------------------------------------- >>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org >>For additional commands, e-mail: lucene-user-help@jakarta.apache.org >> >> >> > > >--------------------------------------------------------------------- >To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org >For additional commands, e-mail: lucene-user-help@jakarta.apache.org > > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org