lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Index advice...
Date Tue, 10 Feb 2004 12:55:07 GMT

--- Leo Galambos <Leo.G@seznam.cz> wrote:
> Otis Gospodnetic napsal(a):
> 
> >Without seeing more information/code, I can't tell which part of
> your
> >system slows down with time, but I can tell you that Lucene's 'add'
> >does not slow over time (i.e. as the index gets larger).  Therefore,
> I
> >would look elsewhere for causes of the slowdown.
> >  
> >
> 
> Otis, can you point me to some proofs that time of "insert" operation
> 
> does not depend on the index size, please? Amortized time of "insert"
> is O(log(docsIndexed/mergeFac)), I think.

This would imply that Lucene gets slower as it adds more documents to
the index.  Have you observed this behaviour?  I haven't.

> Thus I do not know how it could be O(1).

~ O(1) is what I have observed through experiments with indexing of
several million documents.

Otis


> AFAIK the issue with PDF files can be based on the PDF parser (I
> already 
> encountered this with PDFbox).
> 
> >The easiest thing to do is add logging to suspicious portions of the
> >code.  That will narrow the scope of the code you need to analyze.
> >
> >Otis
> >
> >
> >--- kevin@ckhill.com wrote:
> >  
> >
> >>Hey Lucene-users,
> >>
> >>I'm setting up a Lucene index on 5G of PDF files (full-text
> search). 
> >>I've 
> >>been really happy with Lucene so far but I'm curious what tips and
> >>strategies 
> >>I can use to optimize my performance at this large size.
> >>
> >>So far I am using pretty much all of the defaults (I'm new to
> >>Lucene).
> >>
> >>I am using PDFBox to add the documents to the index.
> >>I can usually add about 800 or so PDF files and then the add loop:
> >>
> >>for ( int i = 0; i < fileNames.length; i++ ) {
> >>	Document doc =
> IndexFile.index(baseDirectory+documentRoot+"fileNames
> >>[i]); 
> >>	writer.addDocument(doc);
> >>}
> >>
> >>
> >>really starts to slow down.  Doesn't seem to be memory related.
> >>Thoughts anyone?
> >>
> >>Thanks in advance,
> >>CK Hill
> >>
> >>
> >>
>
>>---------------------------------------------------------------------
> >>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> >>For additional commands, e-mail:
> lucene-user-help@jakarta.apache.org
> >>
> >>    
> >>
> >
> >
>
>---------------------------------------------------------------------
> >To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> >For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >
> >
> >
> >
> >  
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message