lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Spencer <>
Subject Re: a search like Google
Date Fri, 13 Feb 2004 01:30:32 GMT
I have code that does just this.
The calls to "DFields.*" should be replaced with the approp String e.g. 
"title", "url" etc.
A bit of boosting is done too under the heuristic that a title match is 
than a body match.

Only hassle is this is not integrated into the query parser but it's easy
enough to call this directly. Oh, the argument "srch" is the users search
query ("i love lucene" as you have below) and the Analyzer is the one
you used to index your docs.

 public static Query formGoogleQueryFast( String srch, Analyzer a)
        BooleanQuery bq = new BooleanQuery();
        TokenStream ts = a.tokenStream( "foo", new StringReader( srch));
        org.apache.lucene.analysis.Token toke;
            while ( (toke = != null)
                String word = toke.termText().toLowerCase();
                TermQuery title    = new TermQuery( new Term( 
DFields.TITLE,    word));
                TermQuery url      = new TermQuery( new Term( 
DFields.URL,      word));
                TermQuery contents = new TermQuery( new Term( 
DFields.CONTENTS, word));
                title.setBoost( TITLE_BOOST);
                url.setBoost(   URL_BOOST);

                BooleanQuery tmp = new BooleanQuery();
                tmp.add( title,    false, false);
                tmp.add( url,      false, false);
                tmp.add( contents, false, false);

                // bug: if a word like "by" is passed in, and it's on 
the stop list,
                // then it might expand to words (byzzzz) what are not 
in a doc that
                // seems to have "by" in it...
                bq.add( tmp,       true,  false); // must match one

        catch( IOException ioe)
            // can't happen as we're using a string reader
        return bq;

public static final float TITLE_BOOST = 5.0f;
    public static final float URL_BOOST = 2.0f;   

Nicolas Maisonneuve wrote:

>i have a index with the fields :
>i would make the same search type than Google  ( a form with a textfiel). When the user
search "i love lucene" (it's not a phrase query  but just the text in the textfield ), i would
like search  in all the index fields but with a specific weight boost for each field. In this
example title weight=2, author=1 content=1
>the results would be (i suppose  the default operator is "and") :  (title:i^2 author:i
content:i) +(title:love^2 author:love content:love) +(title:lucene^2 author:lucene content:lucene)
>but must i modify the QueryParser  or is there a different way for do this ?
>( because i modified the QueryParser and it's work but if there is a cleaner way to do
this , i take it ! )
>nicolas maisonneuve

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message