lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: Structure of .tii-file
Date Tue, 27 Jul 2010 12:38:06 GMT
On Tue, Jul 27, 2010 at 7:58 AM, Alexander vom Berg <> wrote:
> Hello Mike,
> thanks for your answer!
> I am currently working with Lucene 3.0.1 and except the .tii - file all
> other descriptions are comprehensible.
> The idea behind the tii/tis file structure is for faster retrieving the
> correct terms.
> At first I lookup in memory (tii-file) and take the most nearby hit. With
> this information I can skip to the correct position in the tis-file and scan
> up to my final hit. I don't exactly understand how this skipping is
> realized.
> Do I have a direct pointer to the postion on the hard drive? Or how do I
> find the term without having to much file access? :D

Yes, you have to seek the tis file handle, then you do .next() until
the term matches.  Maybe you stop there, eg if you're just looking for
say the docFreq of that term.  Or, if you then need to iterate the
docs/positions, from that term entry you have the long file pointers
of frq and prx files, which you must seek to and decode.

Btw, what is it that you are doing?  You seem to be re-inventing
Lucene :)  You could simply use Lucene's low level APIs to do this...

> My intention behind this is that I want to run some performance tests on an
> created index with different block sizes of the hard drive.
> Can I just copy this created index on another drive (with different
> blocksize) or do I have to generate the hole index again?


You mean the block size of the underlying filesystem?  If so, then
copying will be fine in that the resulting index will function

However, this may not be a fair performance test since with 'cp'
presumably the IO system may have optimized how the files are
allocated to blocks on disk. Ie, you'll get a different allocation
than had Lucene directly opened these files and written them itself on
the 2nd file system.  You could test both approaches and see if
there's a difference!


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message