lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mirko <idonthaveenoughinformat...@googlemail.com>
Subject Re: Parse eDisMax queries for keywords
Date Mon, 25 Nov 2013 13:51:43 GMT
Hi Jack,
thanks for your reply. Ok in this case I agree that "enriching" the query
in the application layer is a good idea. We are still a bit puzzled how the
enriched query should look like. I'll post here when we found a solution.
If somebody has suggestions, I'd be happy to hear them.

Mirko


2013/11/21 Jack Krupansky <jack@basetechnology.com>

> The query parser does its own tokenization and parsing before your
> analyzer tokenizer and filters are called, assuring that only one white
> space-delimited token is analyzed at a time.
>
> You're probably best off having an application layer preprocessor for the
> query that "enriches" the query in the manner that you're describing.
>
> Or, simply settle for a "heuristic" approach that may give you 70% of what
> you want using only existing Solr features on the server side.
>
> -- Jack Krupansky
>
> -----Original Message----- From: Mirko
> Sent: Thursday, November 21, 2013 5:30 AM
> To: solr-user@lucene.apache.org
> Subject: Parse eDisMax queries for keywords
>
>
> Hi,
> We would like to implement special handling for queries that contain
> certain keywords. Our particular use case:
>
> In the example query "Footitle season 1" we want to discover the keywords
> "season" , get the subsequent number, and boost (or filter for) documents
> that match "1" on field name="season".
>
> We have two fields in our schema:
>
> <!-- "titles" contains titles -->
> <field name="title" type="text" indexed="true" stored="true"
> multiValued="false"/>
>
> <fieldType name="text" class="solr.TextField" omitNorms="true">
>            <analyzer >
>                <charFilter class="solr.MappingCharFilterFactory"
> mapping="mapping-ISOLatin1Accent.txt"/>
>                <tokenizer class="solr.StandardTokenizerFactory"/>
>                <filter class="solr.LowerCaseFilterFactory"/>
>                <!-- ... -->
>            </analyzer>
> </fieldType>
>
> <field name="season" type="season_number" indexed="true" stored="false"
> multiValued="false"/>
>
> <!-- "season" contains season numbers -->
> <fieldType name="season_number" class="solr.TextField" omitNorms="true" >
> <analyzer type="query">
>                        <tokenizer class="solr.KeywordTokenizerFactory"/>
>                <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.PatternReplaceFilterFactory" pattern=".*(?:season)
> *0*([0-9]+).*" replacement="$1"/>
>                </analyzer>
> </fieldType>
>
>
> Our idea was to use a Keyword tokenizer and a Regex on the "season" field
> to extract the season number from the complete query.
>
> However, we use a ExtendedDisMax query parser in our search handler:
>
> <requestHandler name="/select" class="solr.SearchHandler">
>        <lst name="defaults">
>            <str name="defType">edismax</str>
>            <str name="qf">
>            title season
>            </str>
>
>        </lst>
> </requestHandler>
>
>
> The problem is that the eDisMax tokenizes the query, so that our field
> "season" receives the tokens ["Foo", "season", "1"] without any order,
> instead of the complete query.
>
> How can we pass the complete query (untokenized) to the season field? We
> don't understand which tokenizer is used here and why our "season" field
> received tokens instead of the complete query.
>
> Or is there another approach to solve this use case with Solr?
>
> Thanks,
> Mirko
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message