lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jake Mannix" <jake.man...@gmail.com>
Subject Re: factor in stopwords when searching
Date Sat, 22 Mar 2008 03:58:11 GMT
I think the way I've seen it done most often is to either index some
bi-grams which
contain stop words (so "the database" and "search the" are in the index as
individual
tokens), or else to index that piece of content twice - once with stop words
removed
(and stemming, if you use it), and then again in another field which leaves
stop words
in (and doesn't do the stemming).

  This can be simply done with a PerFieldAnalyzer(Wrapper?), and some
creative
query expansion: "searches the databases" =>
+(contents:"search database" orig_contents:"searches the databases")
This former query clause will always match if the latter does, so exact
matches will
get weighted higher than ones which only match the inexact ones.

  -jake

On Fri, Mar 21, 2008 at 7:31 PM, Grant Ingersoll <gsingers@apache.org>
wrote:

> Don't throw away the stopwords?  :-)  Lucene can't score something it
> doesn't know exists.  I suppose you could try to get fancy w/ payloads
> and add payloads if stopwords exist, but I am just thinking out loud
> there.
>
>
> On Mar 21, 2008, at 9:20 PM, Chris Lu wrote:
>
> > Let's say "the" is considered stopword. And for example two
> > documents are
> > document A, content: "... search the database..."
> > document B, content: "... search database..."
> >
> > So when the user's input is "search the database", searching with
> > query content:"search database"~1 can return both.
> > But is there any way to translate that into a query that can rank the
> > document A higher than document B?
> >
> > Thanks!
> >
> > --
> > Chris Lu
> > -------------------------
> > Instant Scalable Full-Text Search On Any Database/Application
> > site: http://www.dbsight.net
> > demo: http://search.dbsight.com
> > Lucene Database Search in 3 minutes:
> >
> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
> > DBSight customer, a shopping comparison site, (anonymous per request)
> > got 2.6 Million Euro funding!
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
>
> --------------------------
> Grant Ingersoll
> http://www.lucenebootcamp.com
> Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam
>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message