lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott ganyo <sc...@ganyo.com>
Subject Re: Index advice...
Date Tue, 10 Feb 2004 13:03:11 GMT
I have.  While document.add() itself doesn't increase over time, the 
merge does.  Ways of partially overcoming this include increasing the 
mergeFactor (but this will increase the number of file handles used), 
or building blocks of the index in memory and then merging them to 
disk.  This has been discussed before, so you should be able to find 
additional information on this fairly easily.

Scott

On Feb 10, 2004, at 7:55 AM, Otis Gospodnetic wrote:

>
> --- Leo Galambos <Leo.G@seznam.cz> wrote:
>> Otis Gospodnetic napsal(a):
>>
>>> Without seeing more information/code, I can't tell which part of
>> your
>>> system slows down with time, but I can tell you that Lucene's 'add'
>>> does not slow over time (i.e. as the index gets larger).  Therefore,
>> I
>>> would look elsewhere for causes of the slowdown.
>>>
>>>
>>
>> Otis, can you point me to some proofs that time of "insert" operation
>>
>> does not depend on the index size, please? Amortized time of "insert"
>> is O(log(docsIndexed/mergeFac)), I think.
>
> This would imply that Lucene gets slower as it adds more documents to
> the index.  Have you observed this behaviour?  I haven't.
>
>> Thus I do not know how it could be O(1).
>
> ~ O(1) is what I have observed through experiments with indexing of
> several million documents.
>
> Otis
>
>
>> AFAIK the issue with PDF files can be based on the PDF parser (I
>> already
>> encountered this with PDFbox).
>>
>>> The easiest thing to do is add logging to suspicious portions of the
>>> code.  That will narrow the scope of the code you need to analyze.
>>>
>>> Otis
>>>
>>>
>>> --- kevin@ckhill.com wrote:
>>>
>>>
>>>> Hey Lucene-users,
>>>>
>>>> I'm setting up a Lucene index on 5G of PDF files (full-text
>> search).
>>>> I've
>>>> been really happy with Lucene so far but I'm curious what tips and
>>>> strategies
>>>> I can use to optimize my performance at this large size.
>>>>
>>>> So far I am using pretty much all of the defaults (I'm new to
>>>> Lucene).
>>>>
>>>> I am using PDFBox to add the documents to the index.
>>>> I can usually add about 800 or so PDF files and then the add loop:
>>>>
>>>> for ( int i = 0; i < fileNames.length; i++ ) {
>>>> 	Document doc =
>> IndexFile.index(baseDirectory+documentRoot+"fileNames
>>>> [i]);
>>>> 	writer.addDocument(doc);
>>>> }
>>>>
>>>>
>>>> really starts to slow down.  Doesn't seem to be memory related.
>>>> Thoughts anyone?
>>>>
>>>> Thanks in advance,
>>>> CK Hill
>>>>
>>>>
>>>>
>>
>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>> For additional commands, e-mail:
>> lucene-user-help@jakarta.apache.org
>>>>
>>>>
>>>>
>>>
>>>
>>
>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>

Mime
View raw message