lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Structure of .tii-file
Date Tue, 27 Jul 2010 18:20:05 GMT
On Tue, Jul 27, 2010 at 12:06 PM, Alexander vom Berg <mail@avomberg.de> wrote:

>> However, this may not be a fair performance test since with 'cp'
>> presumably the IO system may have optimized how the files are
>> allocated to blocks on disk. Ie, you'll get a different allocation
>> than had Lucene directly opened these files and written them itself on
>> the 2nd file system.  You could test both approaches and see if
>> there's a difference!
>
> Do you mean problems with fragmentation here? Or what exactly is the
> difference after I copy the index (faster because it's defragmented?)?

Yes I meant defragmented.  EG the cp command probably notifies the OS
up-front how large the target file will be, so, the IO system could
[conceivably] do a better job (less fragmentation) assigning blocks to
the file.  I'm not sure that this happens in practice though, and it's
surely filesystem dependent.

> What happens if I use the copy-Method from
> org.apache.lucene.store.Directory?

Well... that should be closer to having written the index directly?
But I'm not sure if it will be the same.  EG when Lucene is writing
the index, it writes the bytes to multiple (.frq, .prx, .tis, .tii)
files "concurrently", and at a somewhat slow-ish rate, while a copy
does one file at once at a very high rate.  So I'm not sure if this
difference will alter block assigment....

Also, whether you use CFS or not is also a factor: Lucene tries to
notify the OS on the final size when building the CFS (which
should/may help reduce fragmentation).

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message