lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chuck Williams <ch...@manawiz.com>
Subject Strange behavior of positionIncrementGap
Date Fri, 11 Aug 2006 18:32:55 GMT
Hi All,

There is a strange treatment of positionIncrementGap in
DocumentWriter.invertDocument().    The gap is inserted between all
values of a field, except it is not inserted between values if the
prefix of the value list up to that point has not yet generated a token.

For example, if a field F has values A, B and C the following example
cases arise:
  1.  A and B both generate no tokens ==> no positionIncrementGaps are
generated
  2.  A has no tokens but B does ==> just the gap between B and C
  3.  A has tokens but B and C do not ==> both gaps between A and B, and
between B and C are generated

So, empty fields are treated anomalously.  They are ignored for gap
purposes at the beginning of the field list, but included if they occur
later in the field list.

This issue caused a subtle bug in my bulk update operation because to 
modify values and update the postings it must reanalyze them with
precisely the same positions used when they were originally indexed. 
So, I had to match this previously unnoticed strange behavior.

I could post a patch to fix this, but am concerned it might introduce
upward incompatibilities in various implementations and applications
that are dependent on Lucene index format.  If that is not a concern in
this case, please let me know and I'll post a patch.  I at least wanted
to report it.

Chuck


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message