lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Grant Ingersoll" <GSIng...@syr.edu>
Subject Re: Term highlighting and Term vector patch
Date Fri, 01 Oct 2004 18:53:51 GMT
Sounds reasonable.  Is there anything you need from me, then, or do you
have what you need?
>>>>

Hi Grant,

as promised, I am currently looking through your patch. So please, be
patient
for some more days. I stumbled over something in the current
implementation
that took me some hours to understand and test. In the txd-file you
store field
numbers. You are using difference-encoding (store the differences of
field 
numbers, not their absolute values) and variable-length integers. The
problem is that FieldInfos not necesarily store fields in alphabetical
order.
No order is guranteed at all and order can change from segment to
segment, as
well as the field numbers themselves. This means that the field numbers
you are
writing into the txd-file are not necessarily in increasing order and
you can
get negative entries with the difference encoding. Variable-length
intergers due
to their specification (e.g. IndexInput.readVInt()) only work for
positive
numbers. All this was difficult to test, ... ,

The result is: It really is as described above, but luckily,
variable-length
integers also work for negative numbers. So termVerctors work as they
should.
However, I will change from difference encoding for the field numbers
to normal
encoding. I think usualy one does not have more than 256 different
fields and so
difference encoding is not necessary. Furthermore, negative numbers
always take
4 bytes as variable-length integer, so difference encoding actually
needs more
space than normal encoding here. Note that of course difference
encoding for
positions remains unchanged since it definitely is very effective
here.

Christoph










---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org 
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message