lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Smiley (JIRA)" <j...@apache.org>
Subject [jira] Updated: (LUCENE-2529) always apply position increment gap between values
Date Sat, 02 Oct 2010 04:28:34 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-2529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

David Smiley updated LUCENE-2529:
---------------------------------

    Attachment: LUCENE-2529_skip_posIncr_for_1st_token.patch

Always adding the position increment is good but insufficient to solve my problem.

A new patch rectifies the followup situation I reported inadvertently to LUCENE-2668 that
I should have said here.  The jist is that DocInverterPerField _conditionally_ decrements
the position and then always increments it, and this is problematic for attempting to keep
position increments across several multi-value fields aligned (using an analyzer setting posIncr
to 0) when the first value generates no tokens (either blank or stop words).  Mike McCandless
pointed out that the unfortunate existing logic had to do with preventing the position from
becoming -1 which doesn't work with payloads -- LUCENE-1542.  

My new patch here doesn't even have a pre-decrement nor post-increment and thus I find the
code easier to follow.  It ignores the provided position increment of the first token (typically
1), voiding the need to shift them back and forth.  There is one oddity included here and
that is I always add 1 to the position increment _gap_ (i.e. between values).  With this oddity
included, all the tests pass (except for the test for this very issue, which I correct in
this patch)  --yay!  Without this oddity, a handful of tests failed that depended on the first
token adding one to the position.  My +1 up at the value loop can be seen as actually enforcing
that the first token's position is 1, and also adding a +1 for when there is no token for
a value (critical for aligning multiple fields).  Perhaps this +1 should happen at a different
line number to be less confusing but the end result should be the same.

I expect for many people this is very confusing, especially if you're not knee deep in this
subject as I am presently.  Mike, hopefully you're understanding what I'm up to here.  The
tests pass, remember.

> always apply position increment gap between values
> --------------------------------------------------
>
>                 Key: LUCENE-2529
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2529
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.9.3, 3.0.2, 3.1, 4.0
>         Environment: (I don't know which version to say this affects since it's some
quasi trunk release and the new versioning scheme confuses me.)
>            Reporter: David Smiley
>            Assignee: Koji Sekiguchi
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2529_always_apply_position_increment_gap_between_values.patch,
LUCENE-2529_skip_posIncr_for_1st_token.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> I'm doing some fancy stuff with span queries that is very sensitive to term positions.
 I discovered that the position increment gap on indexing is only applied between values when
there are existing terms indexed for the document.  I suspect this logic wasn't deliberate,
it's just how its always been for no particular reason.  I think it should always apply the
gap between fields.  Reference DocInverterPerField.java line 82:
> if (fieldState.length > 0)
>           fieldState.position += docState.analyzer.getPositionIncrementGap(fieldInfo.name);
> This is checking fieldState.length.  I think the condition should simply be:  if (i >
0).
> I don't think this change will affect anyone at all but it will certainly help me.  Presently,
I can either change this line in Lucene, or I can put in a hack so that the first value for
the document is some dummy value which is wasteful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message