lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <j...@basetechnology.com>
Subject Re: How to search for phrase "IAE_UPC_0001"
Date Thu, 31 Jul 2014 22:16:21 GMT
And I have a lot more explanation and examples for word delimiter filter in 
my e-book:
http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html

-- Jack Krupansky

-----Original Message----- 
From: Erick Erickson
Sent: Thursday, July 31, 2014 12:58 PM
To: solr-user@lucene.apache.org
Subject: Re: How to search for phrase "IAE_UPC_0001"

Take a look at WordDelimiterFilterFactory. It has a bunch of
options to allow this kind of thing to be indexed and searched.

Note that in the default schema, the definition in the index part
of the fieldType definition has slightly different parameters than
the query time WordDelimiterFilterFactory, that's a good place
to start.

WARNING: WDFF is a bit complex, you _really_ would be well
served by spending some time with the Admin/Analysis page to
understand the effects of these parameters...

Best,
Erick




On Thu, Jul 31, 2014 at 9:31 AM, Paul Rogers <paul.rogers6@gmail.com> wrote:

> Hi Guys
>
> I have a Solr application searching on data uploaded by Nutch.  The search
> I wish to carry out is for a particular document reference contained 
> within
> the "url" field, e.g. IAE-UPC-0001.
>
> The problem is is that the file names that comprise the url's are not
> consistent, so a url might contain the reference as IAE-UPC-0001 or
> IAE_UPC_0001 (ie using either the minus or underscore as the delimiter) 
> but
> not both.
>
> I have created the query (in the solr admin interface):
>
> url:"IAE-UPC-0001"
>
> which works (returning the single expected document), as do:
>
> url:"IAE*UPC*0001"
> url:"IAE?UPC?0001"
>
> when the doc ref is in the format IAE-UPC-0001 (ie using the minus sign as
> a delimiter).
>
> However:
>
> url:"IAE_UPC_0001"
> url:"IAE*UPC*0001"
> url:"IAE?UPC?0001"
>
> do not work (returning zero documents) when the doc ref is in the format
> IAE_UPC_0001 (ie using the underscore character as the delimiter).
>
> I'm assuming the underscore is a special character but have tried looking
> at the solr wiki but can't find anything to say what the problem is.  Also
> the minus sign also has a specific meaning but is nullified by adding the
> quotes.
>
> Can anyone suggest what I'm doing wrong?
>
> Many thanks
>
> Paul
> 


Mime
View raw message