lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Taylor <paul_t...@fastmail.fm>
Subject Re: Using dismax features in Lucene
Date Thu, 26 Jan 2012 10:52:02 GMT
On 10/01/2012 18:16, Chris Hostetter wrote:
> : The book said that dismax query was similar but different to
> :
> : DisjunctionMaxQuery
>
> the dismax *parser* in Solr is relatively simple, the majority of the
> code in it relates to parsing config options, reporting debugging, etc...
>
> if you wanted to do something similar in non-Solr java code my personal
> suggestion would be to just borrow the key ponts of the impl in your own code.
>
> : and additionally did Phrase Boosting which I didnt think DisjunctionMaxQuery
> : did.
>
> the crux of the issue is that the "dismax" parser is named after the fact
> that it heavily uses DisjunctionMaxQuery, constructing one for each
> "clause" of user input, but things like the phrase boosting and function
> boosting it supports are just other queries it takes and adds to the top
> level boolean query it builds.  You can find a writeup i did on the
> concept of the dismax parser at the link below...
>
> https://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/java/org/apache/solr/search/DisMaxQParser.java?view=markup
> https://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/java/org/apache/solr/util/SolrPluginUtils.java?view=markup
>
> https://wiki.apache.org/solr/DisMax
> http://www.lucidimagination.com/blog/2010/05/23/whats-a-dismax/
>
>
> -Hoss
>
Hi Chris

Thanks I now have something working but any comment on it would be more 
then welcome

Some background this is what I used to do.

I took the query entered by user , escape any lucene special characters 
then do a string replacement as follows, where {0} is the escaped 
original query

         "artist:\"{0}\"^1.6 " +
         "(+sortname:\"{0}\"^1.6 -artist:\"{0}\") " +
         "(+alias:\"{0}\" -artist:\"{0}\" -sortname:\"{0}\") "  +
         "(+($artist:({0})^0.8) -artist:\"{0}\" -sortname:\"{0}\" 
-alias:\"{0}\") "  +
         "(+(sortname:({0})^0.8) -artist:({0}) -sortname:\"{0}\" 
-alias:\"{0}\") " +
         "(+(alias:({0})^0.4) -artist:({0}) -sortname:({0}) 
-alias:\"{0}\")";

which I then parsed using the standard QueryParser. What I tried to deal 
was construct a query so that only one section of the six part query 
could match, in retrospect
trying to replicate the way DisjunctionMaxQuery takes the maximum rather 
than sum of each score. Also I preferred complete phrase match in one 
field rather than matching individual terms matching different fields, 
which I think is the same as the tie value in a disjunction.

Now with my new parser a user query of 'farming incident' would return

+((alias:farming^0.4 | sortname:farming^0.8 | artist:farming^1.6)~0.1 
(alias:incident^0.4 | sortname:incident^0.8 | artist:incident^1.6)~0.1) 
(alias:"farming incident"^0.4 | sortname:"farming incident"^0.8 | 
artist:"farming incident"^1.6)~0.1

and a search for "farming" would return

(alias:farming^0.4 | sortname:farming^0.8 | artist:farming^1.6)~0.1


A couple of specific questions

1. Do I need a different boost for the phrase parts compared to  the 
individual term queries so that a phrase match scores higher or i s that 
taken care of.
2. Does the order of the fields in  the resultant query make any 
difference, whatever order I add them to the map they are always output 
as alias, sortname, artist)
3. Is tie 0.1 a good value, in the example above I want a match to 
phrase "farming incident" in the artist field to score higher then any 
other match, also I would want a match to alias to phrase "farming 
incident" to do better than a match of just farming to artist and 
incident to sortname fields.

Here is my DismaxQueryParser class

import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.*;
import org.musicbrainz.search.LuceneVersion;

import java.util.HashMap;
import java.util.Map;

public class DismaxQueryParser {

     public static String IMPOSSIBLE_FIELD_NAME = "\uFFFC\uFFFC\uFFFC";
     private DisjunctionQueryParser dqp;

     public DismaxQueryParser(org.apache.lucene.analysis.Analyzer 
analyzer) {
         dqp = new DisjunctionQueryParser(IMPOSSIBLE_FIELD_NAME, analyzer);
     }

     public Query parse(String query) throws 
org.apache.lucene.queryParser.ParseException {

         Query q0     = 
dqp.parse(DismaxQueryParser.IMPOSSIBLE_FIELD_NAME+":("+query+")");
         Query phrase = 
dqp.parse(DismaxQueryParser.IMPOSSIBLE_FIELD_NAME+":(\""+query+"\")");
         if (phrase instanceof DisjunctionMaxQuery) {
             BooleanQuery bq = new BooleanQuery(true);
             bq.add(q0, BooleanClause.Occur.MUST);
             bq.add(phrase, BooleanClause.Occur.SHOULD);
             System.out.println(bq);
             return bq;
         }
         else {
             System.out.println(q0);
             return q0;
         }

     }

     public void addAlias(String field, Alias alias) {
                 dqp.addAlias(field, alias);
     }

     static class DisjunctionQueryParser extends QueryParser {


         public DisjunctionQueryParser(String defaultField, 
org.apache.lucene.analysis.Analyzer analyzer) {
                 super(LuceneVersion.LUCENE_VERSION, defaultField, 
analyzer);
             }


         protected Map<String, Alias> aliases = new HashMap<String, 
Alias>(3);

         //Field to Alias
         public void addAlias(String field, Alias alias) {
             aliases.put(field, alias);
         }

         protected Query getFieldQuery(String field, String queryText, 
boolean quoted) {
             //If field is an alias
             if (aliases.containsKey(field)) {

                 Alias a = aliases.get(field);
                 DisjunctionMaxQuery q = new 
DisjunctionMaxQuery(a.getTie());
                 boolean ok = false;

                 for (String f : a.getFields().keySet()) {

                     //if query can be created for this field and text
                     Query sub = getFieldQuery(f, queryText, quoted);
                     if (sub != null) {

                         //if query was quoted but doesnt generate a 
phrase query we reject
                         if(quoted==false || sub instanceof PhraseQuery)
                         {
                             //If Field has a boost
                             if (a.getFields().get(f) != null) {
                                 sub.setBoost(a.getFields().get(f));
                             }
                             q.add(sub);
                             ok = true;
                         }
                     }
                 }
                 //Something has been added to disjunction query
                 return ok ? q : null;

             } else {
                 //usual Field
                 try {
                     return super.getFieldQuery(field, queryText, quoted);
                 } catch (Exception e) {
                     return null;
                 }
             }
         }
     }

     static class Alias {
         public Alias()
         {

         }
         private float tie;
         //Field Boosts
         private Map<String, Float> fields;

         public float getTie() {
             return tie;
         }

         public void setTie(float tie) {
             this.tie = tie;
         }

         public Map<String, Float> getFields() {
             return fields;
         }

         public void setFields(Map<String, Float> fields) {
             this.fields = fields;
         }
     }
}


And this is how I call it:

         Map<String, Float> fieldBoosts = new HashMap<String, Float>(3);
         fieldBoosts.put(ArtistIndexField.ARTIST.getName(), 1.6f);
         fieldBoosts.put(ArtistIndexField.SORTNAME.getName(), 0.8f);
         fieldBoosts.put(ArtistIndexField.ALIAS.getName(), 0.4f);
         alias = new DismaxQueryParser.Alias();
         alias.setFields(fieldBoosts);
         alias.setTie(0.1f);
         query=QueryParser.escape(query);
         DismaxQueryParser queryParser = new DismaxQueryParser(analyzer);
         queryParser.addAlias(DismaxQueryParser.IMPOSSIBLE_FIELD_NAME, 
alias);
         Query q = queryParser.parse(query);
         return q;


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message