lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Burlison <alan.burli...@gmail.com>
Subject Re: Position increment clarification?
Date Sun, 15 Sep 2013 12:04:34 GMT
On 15/09/13 12:21, Uwe Schindler wrote:

> Using multiple fields is the preferred approach! Internally in the
> index this does the same like a single field with some gaps in the
> positions.

Right, thanks.

> All Tokenizers inside in Lucene *set* the position increment
> accordingly, but filters are not required to read it (unless they
> change it somehow). The attribute is solely for the IndexWriter when
> creating the index. To insert manual gaps without multiple fields you
> have to write an own TokenFilter or use the deprecated PositionFilter
> one. But this is in general more work and much more complicated and
> harder to understand than adding the same field multiple times.

That confirms what I'd thought based on a wander through the source. I'd 
read Lucene in Action and just got myself confused about what the best 
approach was.

> The position increment gap is only respected by IndexWriter when
> indexing, TokenStreams don't see it (because every field instance
> gets own TokenStream).

Yes, that makes sense.

> The default position increment gap of all Analyzers has a sensible
> value to prevent PhraseQueries to match over 2 field instances. This
> is the main reason why the gap is there: prevent position-sensitive
> queries to match across fields.

Are you sure? I see this in Analyzer.java:

* Invoked before indexing a IndexableField instance if
* terms have already been added to that field.  This allows custom
* analyzers to place an automatic position increment gap between
* IndexbleField instances using the same field name.  The default value
* position increment gap is 0.  With a 0 position increment gap and
* the typical default token position increment of 1, all terms in a field,
* including across IndexableField instances, are in successive 
positions, allowing
* exact PhraseQuery matches, for instance, across IndexableField 
instance boundaries.

and I can't find where any of the other analyzers override the 
getPositionIncrementGap method.

I've been using Luke to examine the generated index but I haven't been 
able to find a way to display the position value of each instance of a 
duplicated field so I wasn't quite sure if what I was doing was actually 
working.

-- 
Alan Burlison
--

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message