lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Ranged Query
Date Wed, 03 Apr 2013 12:43:32 GMT
Assuming your custom filter emits one and only one token, making it
multiTermAware is fine. What that means is that when you add wildcards
to your query terms, your filter will automatically be put into the
filter analysis chain at query time as well as index time. You care if
you expect to search exact terms, i.e. you want to search on something
like PQ239.H* and hit PQ 0239.000000 H0.630000 002008.

Hmmmm, that could be your problem here, DS763 won't match anything
before DT since all your DS entries are "DS " and DS763 is after
everything that starts with "DS ". So I'm guessing that if you started
with "DS 0763" you'd get what you wanted? That'd be evidence that you
do need MultiTermAwareness.... But attach &debug=query to see exactly
what the results of query parsing are, because I'm reaching a bit here
and making the assumption that the standard analysis chain gets called
in the range case.

As an aside, if it's still early in your project's life-cycle,
consider changing the hyphens in your field names to underscores.
Hyphens will work, but it's _really_ easy to miss a space somewhere
and have them treated as the NOT operator and then have to debug
things.....

Best
Erick

On Wed, Apr 3, 2013 at 8:08 AM, Osullivan L. <L.Osullivan@swansea.ac.uk> wrote:
> Greetings,
>
> I have a custom analyzer which converts Library of Congress Callnumbers into
> normalized strings:
>
>    <fieldType name="LCNormalized" class="solr.TextField"
> sortMissingLast="true" omitNorms="true">
>       <analyzer>
>         <tokenizer class="solr.KeywordTokenizerFactory"/>
>         <filter class="org.vufind.solr.analysis.LCCNormalizeFilterFactory"/>
>       </analyzer>
>     </fieldType>
>    <field name="callnumber-normalized" type="LCNormalized" indexed="true"
> stored="true" />
>
> Thus, values like:
>
> PQ239.Z56
> PQ239.H63 2008
> PQ239.S62 1982
> PQ239.B68 1983
> PQ2390.S35 A5
> PQ2390.S35 B8 1898
> PQ2389 .R65 F3 1854 t.1
> PQ239.A7 1969
> PQ2.N6 1959
> PQ22.A4 D47 1949
> PQ238.L57 1985
>
> become:
>
> PQ 0239.000000 Z0.560000
> PQ 0239.000000 H0.630000 002008
> PQ 0239.000000 S0.620000 001982
> PQ 0239.000000 B0.680000 001983
> PQ 2390.000000 S0.350000 A0.500000
> PQ 2390.000000 S0.350000 B0.800000 001898
> PQ 2389.000000 R0.650000 F0.300000 001854 T.000001
> PQ 0002.000000 N0.600000 001959
> PQ 0022.000000 A0.400000 D0.470000 001949
> PQ 0238.000000 L0.570000 001985
>
> This allows items to be accurately sorted by callnumber.
>
> I would also like to perform ranged searches on the normalised callnumber
> but whereas callnumber-normalized=[DS+TO+FE] will correctly list items with
> callnumbers between DS and FE, starting with DT* and finishing with FD* ,
> callnumber-normalized=[DS763+TO+FE] incorrectly starts at DT* and finishes
> with FD*.
>
> Can anyone explain why this might be the case?
>
> Looking at
> http://wiki.apache.org/solr/MultitermQueryAnalysis#Current_components_that_implement_MultiTermAwareComponent,
> would I have to add one of the MultiTermAware Factories to make this work?
>
> Thanks,
>
> Luke
>
> --
> Luke O'Sullivan
> Systems Developer
> Web Team
> Swansea University, Singleton Park, Swansea SA2 8PP, UK
> l.osullivan@swansea.ac.uk
> 01792 602772
> @l_os_cymru

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message