lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <goks...@gmail.com>
Subject Re: Proposal: Full support for multi-word synonyms at query time
Date Sat, 11 Aug 2012 03:36:30 GMT
I would do the query parser part first, without the graph part. This
would allow two words without quotes to match a two-word synonym. This
would be a great improvement on the current behavior. Suggested
behavior:

one two three
- "one two", "two three" and "one two three" will checked against synonyms
one two "three"
- "one two" can be a synonym
one two OR three
- "one two" can be a synonym
one OR two OR three
- no multi-word synonyms

This would be a clear intuitive behavior. I'm sure there are other use
cases that may not make sense, but these are the common use case.

On Fri, Aug 10, 2012 at 2:21 PM, Jack Krupansky <jack@basetechnology.com> wrote:
> I just noticed this in SynonymFilter in trunk:
>
> // TODO: we should set PositionLengthAttr too...
>
> It looks like the code does in fact set the PositionLengthAttribute, so
> maybe it is just a dead TODO.
>
> And, I see the following comment (which I had seen before and was the basis
> for my assertion that arbitrary graphs were not supported:
>
> * <p><b>NOTE</b>: when a match occurs, the output tokens
> * associated with the matching rule are "stacked" on top of
> * the input stream (if the rule had
> * <code>keepOrig=true</code>) and also on top of another
> * matched rule's output tokens.  This is not a correct
> * solution, as really the output should be an arbitrary
> * graph/lattice.  For example, with the above match, you
> * would expect an exact <code>PhraseQuery</code> <code>"y b
> * c"</code> to match the parsed tokens, but it will fail to
> * do so.  This limitation is necessary because Lucene's
> * TokenStream (and index) cannot yet represent an arbitrary
> * graph.</p>
>
> Granted, some of that is specific to index-time support for synonyms, which
> I am avoiding, but it is a source for some confusion. If full graphs are
> somehow supported at query time (or in the TokenStream in general), that
> should be stated more clearly.
>
>
> -- Jack Krupansky
>
> -----Original Message----- From: Robert Muir
> Sent: Friday, August 10, 2012 1:44 PM
> To: dev@lucene.apache.org
> Subject: Re: Proposal: Full support for multi-word synonyms at query time
>
>
> On Fri, Aug 10, 2012 at 1:36 PM, Jack Krupansky <jack@basetechnology.com>
> wrote:
>>
>> One of the ongoing potholes of Solr and Lucene is lack of full support for
>> multi-word synonyms at query time. The root of the problem is twofold:
>> individual terms are presented for analysis which precludes recognition of
>> multi-term synonyms, and the output stream from the analyis process is a
>> single, linear stream without regard to any graph/lattice structure for
>> multiple synonyms.
>
>
> But this is not true. PositionLengthAttribute was already added, which
> makes it a graph.
>
> --
> lucidimagination.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>



-- 
Lance Norskog
goksron@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message