lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Halácsy Péter <halacsy.pe...@axelero.com>
Subject RE: Googlifying lucene querys
Date Sat, 02 Mar 2002 23:05:46 GMT
I think this document is very interesting:
http://www7.scu.edu.au/programme/fullpapers/1921/com1921.htm

peter

> -----Original Message-----
> From: Doug Cutting [mailto:DCutting@grandcentral.com]
> Sent: Monday, February 25, 2002 7:15 PM
> To: 'Lucene Users List'
> Subject: RE: Googlifying lucene querys
> 
> 
> If you put the title in a separate field from the contents, 
> and search both
> fields, matches in the title will usually be stronger, 
> without explicit
> boosting.  This is because the scores are normalized by the 
> length of the
> field, and the title tends to be much shorter than the 
> contents.  So even
> without boosting, title matches usually come before contents matches.
> 
> Doug
> 
> > -----Original Message-----
> > From: Spencer, Dave [mailto:dave@lumos.com]
> > Sent: Monday, February 25, 2002 10:22 AM
> > To: Lucene Users List
> > Subject: RE: Googlifying lucene querys
> > 
> > 
> > I'm pretty sure google gives priority to the words appearing in the
> > title and URL.
> > 
> > I believe sect 4.2.5 says this here:
> > http://citeseer.nj.nec.com/cache/papers/cs/13017/http:zSzzSzww
> > w-db.stanf
> > ord.eduzSzpubzSzpaperszSzgoogle.pdf/brin98anatomy.pdf
> > from here: 
> > http://citeseer.nj.nec.com/brin98anatomy.html
> > 
> > So you have to have Lucene store the title as a separate field.
> > 
> > This is then what you'd have if like me you boost (the caret 
> > is "boost")
> > the title by *5 and the URL by *2:
> > 
> > +(title:george^5.0 url:george^2.0 contents:george) +(title:bush^5.0
> > url:bush^2.0 contents:bush) +(title:white^5.0 url:white^2.0
> > contents:white) +(title:house^5.0 url:house^2.0 contents:house)
> > 
> > 
> > -----Original Message-----
> > From: Ian Lea [mailto:ian.lea@blackwell.co.uk]
> > Sent: Saturday, February 23, 2002 8:15 AM
> > To: Lucene Users List
> > Subject: Re: Googlifying lucene querys
> > 
> > 
> > +george +bush +white +house
> > 
> > 
> > --
> > Ian.
> > 
> > Jari Aarniala wrote:
> > > 
> > > Hello,
> > > 
> > > Despite of the confusing subject ;) my question is 
> simple. I'm just
> > > trying out Lucene for the first time and would like to 
> know how one
> > > would go on implementing the search on the index with the 
> same logic
> > > that Google uses.
> > >         For example, if the user input is "george bush 
> white house",
> > how
> > > do I easily construct a query that searches ALL of the 
> > words above? If
> > I
> > > have understood correctly, passing the search string above to the
> > > queryParser creates a query that search for ANY of the 
> words above.
> > > 
> > >         Thanks for any help,
> > 
> > --
> > To unsubscribe, e-mail:
> > <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
> > <mailto:lucene-user-help@jakarta.apache.org>
> > 
> > 
> > --
> > To unsubscribe, e-mail:   
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
> 
> --
> To unsubscribe, e-mail:   
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: 
> <mailto:lucene-user-help@jakarta.apache.org>
> 
> 

--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message