lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jian chen <chenjian1...@gmail.com>
Subject skipInterval
Date Sun, 16 Oct 2005 01:36:48 GMT
Hi, All,

I was reading some research papers regarding quick inverted index lookups.
The classical approach to skipping dictates that a skip should be positioned
every sqrt(df) document pointers.

I looked at the the current Lucene implementation. The skipInterval is
hardcoded as follows in TermInfosWriter.java class:
int skipInterval = 16;

Therefore, I have two questions:

1) Would it be a good idea and feasible to use sqrt(df) to be the
skipInterval, rather than hardcode it?

2) When merging segments, for every term, the skip table is buffered first
in the RAMOutputStream and then written to the output stream. If there are
lot of documents for a term, this seems to consume a lot of memory, right?
If instead, we use sqrt(df) to be the skipInterval, the memory consumed will
be a lot less, as it is logarithmic.

Hope some one could shed more light on this. Thanks in advance,

Jian

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message