lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1253) LengthFilter may generate a TokenStream where first token has positionIncrement==0
Date Sun, 30 Mar 2008 21:02:25 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12583486#action_12583486
] 

Hoss Man commented on LUCENE-1253:
----------------------------------

the more general question is: should LengthFilter have an option to (or by default) change
the position of the Tokens it lets through to be realative the positions of the tokens it
strips out.

ie given a stream of tokens expressed as <term,positionIncrement> ...

  <a,1> <b,1> <c,1> <ddddd,0> <e,0> <f,2> <ggggg,0>
<hhhhhh,1>

should the resulting stream after using a LengthFilter with min=3 be...

  <ddddd,0> <ggggg,0> <hhhhhh,1>

...(which i believe is the current behavior) or should it be...

   <ddddd,3> <ggggg,2> <hhhhhh,1>

FWIW: StopFilter seems to have code to handle this (but I haven't tested that it works correctly)

The question of whether or not it's legal for the first token of a stream to have a positionIncrement
of "0" is being discussed on the list, most likely if it needs changed, that would be done
in IndexWriter DocumentsWriter

> LengthFilter may generate a TokenStream where first token has positionIncrement==0
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-1253
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1253
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 2.3.1
>            Reporter: Walter Ferrara
>            Priority: Minor
>
> See for reference:
> http://www.nabble.com/WordDelimiterFilter%2BLenghtFilter-results-in-termPosition%3D%3D-1-td16306788.html
> and http://www.nabble.com/Lucene---Java-f24284.html
> It seems that LengthFilter (at least) could produce a stream in which the first Token
has a positionIncrement of 0, which make CheckIndex and Luke function "Reconstruct&Edit"
to generate exception.
> Should something be done to avoid this situation, or could the error be ignored (by allowing
Term with a position of -1, and relaxing CheckIndex checks?)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message