lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander vom Berg <m...@avomberg.de>
Subject Re: Structure of .tii-file
Date Tue, 27 Jul 2010 16:06:26 GMT
Hello Mike,

Am 27.07.2010 14:38, schrieb Michael McCandless:
> On Tue, Jul 27, 2010 at 7:58 AM, Alexander vom Berg<mail@avomberg.de>  wrote:
>    
>> Hello Mike,
>>
>> thanks for your answer!
>> I am currently working with Lucene 3.0.1 and except the .tii - file all
>> other descriptions are comprehensible.
>> The idea behind the tii/tis file structure is for faster retrieving the
>> correct terms.
>> At first I lookup in memory (tii-file) and take the most nearby hit. With
>> this information I can skip to the correct position in the tis-file and scan
>> up to my final hit. I don't exactly understand how this skipping is
>> realized.
>> Do I have a direct pointer to the postion on the hard drive? Or how do I
>> find the term without having to much file access? :D
>>      
> Yes, you have to seek the tis file handle, then you do .next() until
> the term matches.  Maybe you stop there, eg if you're just looking for
> say the docFreq of that term.  Or, if you then need to iterate the
> docs/positions, from that term entry you have the long file pointers
> of frq and prx files, which you must seek to and decode.
>
> Btw, what is it that you are doing?  You seem to be re-inventing
> Lucene :)  You could simply use Lucene's low level APIs to do this...
>
>    

this was meant more as a question and if my assumptions how Lucene works 
are correct. :) Sorry for beeing unclear.
I don't want to implement it myself!

>> My intention behind this is that I want to run some performance tests on an
>> created index with different block sizes of the hard drive.
>> Can I just copy this created index on another drive (with different
>> blocksize) or do I have to generate the hole index again?
>>      
> Ahhh.
>
> You mean the block size of the underlying filesystem?  If so, then
> copying will be fine in that the resulting index will function
> correctly.
>
> However, this may not be a fair performance test since with 'cp'
> presumably the IO system may have optimized how the files are
> allocated to blocks on disk. Ie, you'll get a different allocation
> than had Lucene directly opened these files and written them itself on
> the 2nd file system.  You could test both approaches and see if
> there's a difference!
>
>    

Do you mean problems with fragmentation here? Or what exactly is the 
difference after I copy the index (faster because it's defragmented?)?
What happens if I use the copy-Method from 
org.apache.lucene.store.Directory?

> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>    

Best regards
Alex

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message