lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mirko <idonthaveenoughinformat...@googlemail.com>
Subject Parse eDisMax queries for keywords
Date Thu, 21 Nov 2013 10:30:59 GMT
Hi,
We would like to implement special handling for queries that contain
certain keywords. Our particular use case:

In the example query "Footitle season 1" we want to discover the keywords
"season" , get the subsequent number, and boost (or filter for) documents
that match "1" on field name="season".

We have two fields in our schema:

<!-- "titles" contains titles -->
<field name="title" type="text" indexed="true" stored="true"
 multiValued="false"/>

<fieldType name="text" class="solr.TextField" omitNorms="true">
            <analyzer >
                <charFilter class="solr.MappingCharFilterFactory"
mapping="mapping-ISOLatin1Accent.txt"/>
                <tokenizer class="solr.StandardTokenizerFactory"/>
                <filter class="solr.LowerCaseFilterFactory"/>
                <!-- ... -->
            </analyzer>
</fieldType>

<field name="season" type="season_number" indexed="true" stored="false"
multiValued="false"/>

<!-- "season" contains season numbers -->
<fieldType name="season_number" class="solr.TextField" omitNorms="true" >
<analyzer type="query">
                        <tokenizer class="solr.KeywordTokenizerFactory"/>
                <filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PatternReplaceFilterFactory" pattern=".*(?:season)
*0*([0-9]+).*" replacement="$1"/>
                </analyzer>
</fieldType>


Our idea was to use a Keyword tokenizer and a Regex on the "season" field
to extract the season number from the complete query.

However, we use a ExtendedDisMax query parser in our search handler:

<requestHandler name="/select" class="solr.SearchHandler">
        <lst name="defaults">
            <str name="defType">edismax</str>
            <str name="qf">
            title season
            </str>

        </lst>
</requestHandler>


The problem is that the eDisMax tokenizes the query, so that our field
"season" receives the tokens ["Foo", "season", "1"] without any order,
instead of the complete query.

How can we pass the complete query (untokenized) to the season field? We
don't understand which tokenizer is used here and why our "season" field
received tokens instead of the complete query.

Or is there another approach to solve this use case with Solr?

Thanks,
Mirko

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message