lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Re: Using Lucene for searching tokens, not storing them.
Date Sun, 16 Apr 2006 23:03:30 GMT
On Sunday 16 April 2006 19:18, karl wettin wrote:
> 
> 15 apr 2006 kl. 21.32 skrev Paul Elschot:
> >>
> >> implements TermPositions {
> >>          public int nextPosition() throws IOException {
> >
> > This enumerates all positions of the Term in the document
> > as returned by the Tokenizer used by the Analyzer
> 
> Aha. And I didn't see the TermPositionVector until now.
> 
> This leads me to a new question. How is multiple fields with the same  
> name treated? Are the positions concated or in a "z-axis"? I see  
> SpanQuery-troubles with both.
> 
> Concated renders SpanFirst unusable on fields n > 0
> 	[hello,0] [world,1] [foo,2] [bar,3]
> 
> "Z-axis" mess up SpanNear, as "hello bar" is correct.
> 	[hello,0] [world,1]
> 	[foo,0] [bar,1]
> 
> Hmm.. (with double semantics, as this would mean I can't use the term  
> positions to train my hidden markov models).

Sorry, no new dimension. The token position just increases at each new
field with the same name. But multiple stored fields with the same field name
can be retrieved iirc.
It is possible to index a larger position gap between two such fields to avoid
query distance matching over the gaps.
Extra dimensions can be had by indexing term tags (as terms) at the same
positions as their corresponding terms.

Regards,
Paul Elschot

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message