lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Regex Phrases
Date Wed, 22 Mar 2017 23:52:22 GMT
Susheel:

That'll work, but the options you've specified for
WordDelimiterFilterFactory pretty much make it so it's doing nothing.
I realize it's commented out...

That said, it's true that if you have a very specific pattern you want
to recognize a Regex can do the trick. WDFF is a bit more generic
though when you have less specific requirements.

Best,
Erick

On Wed, Mar 22, 2017 at 12:56 PM, Susheel Kumar <susheel2777@gmail.com> wrote:
> I have used PatternReplaceFilterFactory in some of these situations. e.g.
> below
>
> <tokenizer class="solr.ClassicTokenizerFactory"/> <!-- <filter
> class="solr.WordDelimiterFilterFactory" generateWordParts="0"
> generateNumberParts="0" catenateWords="0" catenateNumbers="1"
> catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0" /> --> <filter
> class="solr.PatternReplaceFilterFactory" pattern="(\d+)-(\d+)-?(\d+)$"
> replacement="$1$2$3"/>
>
> On Wed, Mar 22, 2017 at 2:54 PM, Mark Johnson <mjohnson@emersonecologics.com
>> wrote:
>
>> Awesome, thank you much!
>>
>> On Wed, Mar 22, 2017 at 2:38 PM, Erick Erickson <erickerickson@gmail.com>
>> wrote:
>>
>> > Take a close look at WordDelimiterFilterFactory, it's designed to deal
>> > with things like part numbers, phone numbers and the like, and the
>> > example you gave is in the same class of problem I think. It'll take
>> > a bit to get your head around what it does, but it'll perfom better
>> > than regexes, assuming you can get what you need out of it.
>> >
>> > And the admin/analysis page will help you _greatly_ in understanding
>> > what the effects of the various parameters are.
>> >
>> > Best,
>> > Erick
>> >
>> > On Wed, Mar 22, 2017 at 11:06 AM, Mark Johnson
>> > <mjohnson@emersonecologics.com> wrote:
>> > > Is it possible to configure Solr to treat text that matches a regex as
>> a
>> > > phrase?
>> > >
>> > > I have a database full of products, and the Title and Description
>> fields
>> > > are text_en, tokenized via the StandardTokenizerFactory. This works in
>> > most
>> > > cases, but a number of products have names like:
>> > >
>> > >  - Vitamin A
>> > >  - Vitamin-A
>> > >  - Vitamin B12
>> > >  - Vitamin B-12
>> > > ...and so on
>> > >
>> > > I have a regex that will match all of the permutations and would like
>> to
>> > > configure the field type so that anything that matches the regex
>> pattern
>> > is
>> > > treated as a single token, instead of being broken up by spaces, etc.
>> Is
>> > > that possible?
>> > >
>> > > --
>> > > *This message is intended only for the use of the individual or entity
>> to
>> > > which it is addressed and may contain information that is privileged,
>> > > confidential and exempt from disclosure under applicable law. If you
>> have
>> > > received this message in error, you are hereby notified that any use,
>> > > dissemination, distribution or copying of this message is prohibited.
>> If
>> > > you have received this communication in error, please notify the sender
>> > > immediately and destroy the transmitted information.*
>> >
>>
>>
>>
>> --
>>
>> Best Regards,
>>
>> *Mark Johnson* | .NET Software Engineer
>>
>> Office: 603-392-7017
>>
>> Emerson Ecologics, LLC | 1230 Elm Street | Suite 301 | Manchester NH |
>> 03101
>>
>> <http://www.emersonecologics.com/>  <https://wellevate.me/#/>
>>
>> *Supporting The Practice Of Healthy Living*
>>
>> <http://blog.emersonecologics.com/>
>> <https://www.linkedin.com/company/emerson-ecologics>
>> <https://www.facebook.com/emersonecologics/>
>> <https://twitter.com/EmersonEcologic>
>> <https://www.instagram.com/emerson_ecologics/>
>> <https://www.pinterest.com/emersonecologic/>
>> <https://www.glassdoor.com/Overview/Working-at-Emerson-
>> Ecologics-EI_IE388367.11,28.htm>
>>
>> --
>> *This message is intended only for the use of the individual or entity to
>> which it is addressed and may contain information that is privileged,
>> confidential and exempt from disclosure under applicable law. If you have
>> received this message in error, you are hereby notified that any use,
>> dissemination, distribution or copying of this message is prohibited. If
>> you have received this communication in error, please notify the sender
>> immediately and destroy the transmitted information.*
>>

Mime
View raw message