lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <DCutt...@grandcentral.com>
Subject RE: Googlifying lucene querys
Date Mon, 25 Feb 2002 18:14:31 GMT
If you put the title in a separate field from the contents, and search both
fields, matches in the title will usually be stronger, without explicit
boosting.  This is because the scores are normalized by the length of the
field, and the title tends to be much shorter than the contents.  So even
without boosting, title matches usually come before contents matches.

Doug

> -----Original Message-----
> From: Spencer, Dave [mailto:dave@lumos.com]
> Sent: Monday, February 25, 2002 10:22 AM
> To: Lucene Users List
> Subject: RE: Googlifying lucene querys
> 
> 
> I'm pretty sure google gives priority to the words appearing in the
> title and URL.
> 
> I believe sect 4.2.5 says this here:
> http://citeseer.nj.nec.com/cache/papers/cs/13017/http:zSzzSzww
> w-db.stanf
> ord.eduzSzpubzSzpaperszSzgoogle.pdf/brin98anatomy.pdf
> from here: 
> http://citeseer.nj.nec.com/brin98anatomy.html
> 
> So you have to have Lucene store the title as a separate field.
> 
> This is then what you'd have if like me you boost (the caret 
> is "boost")
> the title by *5 and the URL by *2:
> 
> +(title:george^5.0 url:george^2.0 contents:george) +(title:bush^5.0
> url:bush^2.0 contents:bush) +(title:white^5.0 url:white^2.0
> contents:white) +(title:house^5.0 url:house^2.0 contents:house)
> 
> 
> -----Original Message-----
> From: Ian Lea [mailto:ian.lea@blackwell.co.uk]
> Sent: Saturday, February 23, 2002 8:15 AM
> To: Lucene Users List
> Subject: Re: Googlifying lucene querys
> 
> 
> +george +bush +white +house
> 
> 
> --
> Ian.
> 
> Jari Aarniala wrote:
> > 
> > Hello,
> > 
> > Despite of the confusing subject ;) my question is simple. I'm just
> > trying out Lucene for the first time and would like to know how one
> > would go on implementing the search on the index with the same logic
> > that Google uses.
> >         For example, if the user input is "george bush white house",
> how
> > do I easily construct a query that searches ALL of the 
> words above? If
> I
> > have understood correctly, passing the search string above to the
> > queryParser creates a query that search for ANY of the words above.
> > 
> >         Thanks for any help,
> 
> --
> To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
> 
> 
> --
> To unsubscribe, e-mail:   
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>

--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message