lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ard Schrijvers" <a.schrijv...@hippo.nl>
Subject RE: Setting the maximum number of documents in a lucene segment
Date Sat, 26 May 2007 20:49:39 GMT
Hello Otis,
> 
> Hello Ard,
> 
> What you are after is a higher mergeFactor and probably also 
> a higher maxBufferedDocs.  Is indexing performance the concern?

No, this is not what I am after, and the mergeFactor isn't really solving my issue. My issue
is very similar to (I read this thread later) the thread "maxDocs and Arrays" , http://www.gossamer-threads.com/lists/lucene/java-user/49285.


I also want to keep some sort of derived data of lucene in memory arrays, to enable faceted
authorized navigation in  a jackrabbit (jcr) repository. I have tested for millions of "derived
data documents" in a array and can very efficiently compute faceted auth nav. But, of course,
as the lucene index changes, i need to update my derived data. For adding a document to lucene,
i can normally just append an item to my derived data array, unless:

1) lucene did a merge, and
2) after the merge writer.docCount() != writerDoccountBeforeUpdate + 1 (this means the merge
involved merging a segment where at least one deleted doc was present, reducing docCount)

if 1 and 2 are true, then i need to recreate my derived data array, because the array locations
do not coincide with those from lucene anymore. Therefore, i want to minimize merges (recreating
the array is expensive), which of course can be done as you say by setting a large mergeFactor
(and for example use compoundFile is true to reduce the number of files again) and a large
maxBufferedDocs. But, increasing the default number of documents in the "smallest" segments
from 10 to, say 100, would also help me. 

Then again, I am not sure wether i am doing something which can be achieved more effectively/simply,

thanks in advance for any pointers,

Regards Ard Schrijvers


> Don't go crazy with setting a super high (e.g. 100+) 
> mergeFactor, unless you really have the number of open files 
> on your server(s) set to a solid/high number. maxBufferedDocs 
> can be set to a much higher number, typically, depending on 
> the size of the documents you are trying to index and the 
> amount of heap the JVM has to work with.  There is also a new 
> API for explicit flushes of in-memory documents while 
> indexing to control memory consumption.
> 
> Otis
> --
> Lucene Consulting -- http://lucene-consulting.com/
> 
> 
> ----- Original Message ----
> From: Ard Schrijvers <a.schrijvers@hippo.nl>
> To: java-user@lucene.apache.org
> Sent: Friday, May 25, 2007 8:40:26 AM
> Subject: RE: Setting the maximum number of documents in a 
> lucene segment
> 
> 
> > 
> > Hello,
> > 
> > I am trying to change the maximum number of documents in a 
> > lucene segment. By default it seems to be 10.
> 
> Correction: 10 for the smallest (just created) segments of 
> course, because obviously merged segments are likely to 
> contain many more documents
> 
> > When I have a 
> > mergeFactor of say 10, then on average, after every 100 added 
> > documents lucene is merging segments.
> > 
> > I want each segment to contain more then the default 10 
> > documents, because I need to minimize merging.
> > 
> > Is there a way to achieve this? 
> > writer.setMaxBufferedDocs(largeValue) does not do the trick 
> > (I think because in my case because the writer is flushed and 
> > closed after an few updates)
> > 
> > Does anyone know wether it is possible to make the default 
> > number of documents a segment can contain larger?
> > 
> > Thanks in advance, 
> > 
> > Ard Schrijvers
> > 
> > 
> > -- 
> > 
> > Hippo
> > Oosteinde 11
> > 1017WT Amsterdam
> > The Netherlands
> > Tel  +31 (0)20 5224466
> > -------------------------------------------------------------
> > a.schrijvers@hippo.nl / http://www.hippo.nl
> > -------------------------------------------------------------- 
> > 
> > 
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> > 
> > 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message