lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmet Arslan <iori...@yahoo.com>
Subject Re: Solr - Match whole word only in text fields
Date Fri, 27 Dec 2013 17:12:14 GMT
Hi Haya,

Yes you are correct, "myName=aaa bbb" will produce index terms: "myName", "aaa", "bbb". You
can verify this at admin analysis page. You can test your analyzer by entering sample text
in  an user interface. 
Your query "myName aaa" will be a Phrase Query and will match with above settings.
Your query "myName bbb" won't match.

http://lucene.apache.org/core/4_6_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#Proximity_Searches

It is better to give it a try. 

Ahmet


On Friday, December 27, 2013 6:18 AM, Kydryavtsev Andrey <werder06@yandex.ru> wrote:
Hi everybody!

Ahmet, do I get it correct - if I use this text_char_norm field type, for input "myName=aaa
bbb" I'll index terms "myName", "aaa", "bbb"? So I'll match with query like "myName" or query
like  "bbb", but not match with "myName aaa". I can use this type for query value, so split
"myName aaa" into ( "myName" && "aaa") - and it will work. But this approach will
give false positive match with "myName bbb". What do you think, how I can handle this? One
of the  approaches is to use in this field type KeywordTokenizer+ShingleFilter instead of
WhitespaceTokenizerFactory, so tokens like "myName", "myName aaa", "myName aaa bbb", "aaa",
"aaa bbb", "bbb" will be indexed, but it significantly increased index size in case of long
values. 


26.12.2013, 03:20, "Ahmet Arslan" <iorixxx@yahoo.com>:
> Hi Haya,
>
> With MappingCharFilter you can have full control over character set that you want to
split.
>
> in mappings.txt you will have
>
> ":" => " "
> "=" => " "
>
> Use the following type and see if it suits for your needs. Update mappings.txt according
to your needs.
>
>     <fieldType name="text_char_norm" class="solr.TextField" positionIncrementGap="100"
>
>       <analyzer>
>         <charFilter class="solr.MappingCharFilterFactory" mapping="mappings.txt"/>
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.LowerCaseFilterFactory" />
>       </analyzer>
>     </fieldType>
>
> On Sunday, December 22, 2013 9:19 PM, haya.axelrod <haya.axelrod@gmail.com> wrote:
> I have a text field that can contain very long values (like text files). I
> want to create field type for it (text, not string), in order to have
> something like "Match whole word only" in notepad++, but the delimiter
> should not be only white spaces. If i have:
>
> myName=aaa bbb
>
> I would like to get it for the following search strings "aaa", "bbb", "aaa
> bbb", "myName=aaa bbb", "myName", but not for "aa" or "ame=a" or "a bb".
> Another example is:
>
> <myName>aaa bbb</myName>
> Can i do this somehow?
>
> What should be my field type definition?
>
> The text can contain any character. Before search i'm escaping the search
> string using
> http://lucene.apache.org/solr/4_2_1/solr-solrj/org/apache/solr/client/solrj/util/ClientUtils.html
>
> Thanks
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Solr-Match-whole-word-only-in-text-fields-tp4107795.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Mime
View raw message