lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Høydahl / Cominvent <jan....@cominvent.com>
Subject Re: [jira] Commented: (SOLR-1980) Implement boundary match support
Date Thu, 01 Jul 2010 09:44:30 GMT
I think the TokenFilter approach is the easiest. Another option would be to go deeper and introduce
it as a native query language syntax in some way and add boundarymatch="true" as a parameter
in the schema. Any opinions?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Training in Europe - www.solrtraining.com

On 1. juli 2010, at 05.38, Lance Norskog (JIRA) wrote:

> 
>    [ https://issues.apache.org/jira/browse/SOLR-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884147#action_12884147
] 
> 
> Lance Norskog commented on SOLR-1980:
> -------------------------------------
> 
> Another use case is with phrases, especially sloppy phrases.
> "^hello kitty" would find "hello kitty" at the beginning of the text.
> "^hello"~5 would find "hello" among the first 5 words, but the closer to the beginning,
the better. This is especially interesting for consumer searches- people tend to type the
first word of a movie title first.
> 
>> Implement boundary match support
>> --------------------------------
>> 
>>                Key: SOLR-1980
>>                URL: https://issues.apache.org/jira/browse/SOLR-1980
>>            Project: Solr
>>         Issue Type: New Feature
>>         Components: Schema and Analysis
>>           Reporter: Jan Høydahl
>> 
>> Sometimes you need to specify that a query should match only at the start or end
of a field, or be an exact match.
>> Example content:
>> 1) a quick fox is brown
>> 2) quick fox is brown
>> Example queries:
>> "^quick fox" -> should only match 2)
>> "brown$" -> should match 1) and 2)
>> "^quick fox is brown$" -> should only match 2)
>> Proposed way of implmementation is through a new BoundaryMatchTokenFilter which behaves
like this:
>> On the index side it inserts special unique tokens at beginning and end of field.
These could be some weird unicode sequence.
>> On the query side, it looks for the first character matching "^" or the last character
matching "$" and replaces them with the special tokens.
> 
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message