lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: best choice for ramBufferSizeMB
Date Sun, 18 May 2014 14:56:38 GMT
One thing to keep in mind is that a larger RAM buffer means less
merging later, so even though the immediate observation is no speedup
in indexing rate (and, likely, some slowdown), you are lowering the
future merge cost, which is beneficial (it's a zero sum game!).

Yes, stored fields as well as term vectors are written as-you-go; IW
only buffers in RAM the postings, norms, live docs / deleted
terms,queries, doc values.

Mike McCandless

http://blog.mikemccandless.com


On Thu, May 15, 2014 at 4:23 AM, Gudrun Siedersleben
<Siedersleben@mpdl.mpg.de> wrote:
> Thanks for your answer.
>
> At the moment we use one single thread for indexing. Working with several threads is
a possibility we should  try. Testing with different values for ramBufferSizeMB between 16
MB and 256MB showed that up from 128 MB there was no improvement as you already mentioned.
>
> During the index process when observing the file system where the index files are written
to, we wondered that the .fdt file is permanently flushed to disk even before the ramBufferSizeMB
is reached. Is this correct? Also nothing about this flushing is found in the indexWriters
printStream.
>
> Gudrun
>
>
> -----Urspr√ľngliche Nachricht-----
> Von: Shai Erera [mailto:serera@gmail.com]
> Gesendet: Mittwoch, 14. Mai 2014 16:49
> An: java-user@lucene.apache.org
> Betreff: Re: best choice for ramBufferSizeMB
>
> Well, first make sure that you set ramBufferSizeMB to well below the max Java heap size,
otherwise you could run into OOMs.
>
> While a larger RAM buffer may speed up indexing (since it flushes less often to disk),
it's not the only factor that affects indexing speed.
>
> For instance, if a big portion of your indexing work is reading the files from a slow
storage device (maybe NFS share, remote Http etc.), then that could easily shadow any benefits
of using large RAM buffer.
>
> Also, do you index with a single or multiple threads? Lucene supports multi-threaded
indexing, and it's recommended to do whenever you can, e.g.
> when you run on a sufficiently strong HW (4+ cores...).
>
> Another thing, in the past I noticed that too long RAM buffers did not improve indexing
at all e.g. if your underlying IO system is slow (e.g.
> indexing to an NFS share, distributed file-system etc.), then the cost of flushing a
big RAM buffer became significant, more than indexing in RAM, and e.g. I did not observe improvements
when using ramBufferSizeMB=512 vs 128. Also, using a big RAM buffer uses more space on the
heap, and makes the job of the GC harder. So I think it might be that a too big RAM buffer
may actually slow things down, rather than speed up.
>
> Indexing speed is affected by multiple parameters, the RAM buffer is only one of them...
>
> Shai
>
>
> On Wed, May 14, 2014 at 4:33 PM, Gudrun Siedersleben < Siedersleben@mpdl.mpg.de>
wrote:
>
>> Hi all,
>>
>> we want to speed up building our lucene index.  We set ramBufferSize
>> to some values between 32 and 128 MB, but that does not make any
>> difference concerning the time used for reindexing. We did not set maxBufferedDocs,
..
>> which could conflict.
>> We start the JVM with the following JAVA_OPTS:
>>
>> -Xms128m -Xmx512m -XX:MaxPermSize=256m
>>
>> What is the recommended value for ramBufferSizeMB depending on
>> JAVA_OPTS and perhaps other lucene parameters set? We use Lucene 3.6.0.
>>
>> Best regards
>>
>> Gudrun
>>
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message