lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Lamprecht <clampre...@gmail.com>
Subject Re: n-gram indexing
Date Tue, 19 Jul 2005 05:15:52 GMT
Can you run a phrase query with high slop factor?  like 

"united states of america"~999999

This will match documents with all the terms anywhere in the document.
 But it will also give higher scores when the terms are closer
together, using edit distance I believe.

-chris

On 7/18/05, Rajesh Munavalli <rajeshm@dessci.com> wrote:
> Intution behind adding n-grams is to boost naturally occurring larger
> phrases versus using phrase queries. For example, if I am searching for
> "united states of america", I want the search results to return the
> documents ordered as follows
> 
> Rank 1 - Documents containing all the words occurring together
> Rank 2 - Documents containing maximum number of words in the same
> sentence
> Rank 3 - Documents containing all the words but some might appear in the
> same sentence some may not
> Rank 4 - Documents containig atleast one or two words
> 
> If we have a n-gram index, most probably document talking about "united
> states" gets preference over document containing "united" and "states"
> seperately. If I am correct, this can be achieved without using phrase
> queries. I am not sure if there is a better way to achieve the same
> effect.
> 
> Thanks,
> 
> Rajesh
> 
> 
> -----Original Message-----
> From: Andy Roberts [mailto:mail@andy-roberts.net]
> Sent: Monday, July 18, 2005 5:56 PM
> To: java-user@lucene.apache.org
> Subject: Re: n-gram indexing
> 
> On Monday 18 Jul 2005 21:27, Rajesh Munavalli wrote:
> > At what point do I add n-grams? Does the order in which I add n-grams
> > affect exact phrase queries later? My questions are
> >
> > (1) Should I add all the 1-grams followed by 2-grams followed by
> > 3-grams..etc sentence by sentence OR
> >
> > (2) Add all the 1 grams of entire document first before starting
> > 2-grams for the entire document?
> >
> > What is the general accepted notion of adding n-grams of a document?
> >
> > thanks,
> >
> > Rajesh
> 
> I can't see any real advantage of storing n-grams explicitly. Just index
> the document and use phrase queries. Order is significant with phrase
> queries if I recall correctly, although you can use SpanNearQueries to
> look for unordered ngrams, although I don't know why you would want to!
> 
> Perhaps if you explain a little more about what you are trying to
> achieve more generally, we can confirm that you don't need to mess with
> explicit indexing of indexing.
> 
> Andy
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message