lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <j...@basetechnology.com>
Subject Re: Proposal: Full support for multi-word synonyms at query time
Date Fri, 10 Aug 2012 21:21:15 GMT
I just noticed this in SynonymFilter in trunk:

// TODO: we should set PositionLengthAttr too...

It looks like the code does in fact set the PositionLengthAttribute, so 
maybe it is just a dead TODO.

And, I see the following comment (which I had seen before and was the basis 
for my assertion that arbitrary graphs were not supported:

* <p><b>NOTE</b>: when a match occurs, the output tokens
* associated with the matching rule are "stacked" on top of
* the input stream (if the rule had
* <code>keepOrig=true</code>) and also on top of another
* matched rule's output tokens.  This is not a correct
* solution, as really the output should be an arbitrary
* graph/lattice.  For example, with the above match, you
* would expect an exact <code>PhraseQuery</code> <code>"y b
* c"</code> to match the parsed tokens, but it will fail to
* do so.  This limitation is necessary because Lucene's
* TokenStream (and index) cannot yet represent an arbitrary
* graph.</p>

Granted, some of that is specific to index-time support for synonyms, which 
I am avoiding, but it is a source for some confusion. If full graphs are 
somehow supported at query time (or in the TokenStream in general), that 
should be stated more clearly.

-- Jack Krupansky

-----Original Message----- 
From: Robert Muir
Sent: Friday, August 10, 2012 1:44 PM
To: dev@lucene.apache.org
Subject: Re: Proposal: Full support for multi-word synonyms at query time

On Fri, Aug 10, 2012 at 1:36 PM, Jack Krupansky <jack@basetechnology.com> 
wrote:
> One of the ongoing potholes of Solr and Lucene is lack of full support for
> multi-word synonyms at query time. The root of the problem is twofold:
> individual terms are presented for analysis which precludes recognition of
> multi-term synonyms, and the output stream from the analyis process is a
> single, linear stream without regard to any graph/lattice structure for
> multiple synonyms.

But this is not true. PositionLengthAttribute was already added, which
makes it a graph.

-- 
lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message