lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avi Rosenschein <arosensch...@gmail.com>
Subject Re: PhraseQuery with term positions
Date Tue, 19 Jan 2010 13:38:55 GMT
Index is pretty large (50GB, divided into 8 shards). I'm afraid I would
start running into memory issues by adding the stop words (though it is
definitely something I would like to test at some point).

My question was more to try to understand if this was known behavior in
lucene, since I can't really think of a situation where this would be
desired (maybe if the user was knowingly searching for "a
[one-word-wildcard] b"; but a better way to do that would be with slop, not
with term positions). Wouldn't it be better to have the ExactPhraseScorer
not allow unmatched holes (i.e. terms in the document that are not matched
in the query)?

-- Avi

On Tue, Jan 19, 2010 at 3:28 PM, Erick Erickson <erickerickson@gmail.com>wrote:

> How big is your index? Because the simplest thing would be
> to just not remove stopwords at index or query time. Perhaps
> in a duplicate field depending upon your needs.
>
> Erick
>
> On Tue, Jan 19, 2010 at 6:50 AM, Avi Rosenschein <arosenschein@gmail.com
> >wrote:
>
> > Hi,
> >
> > I am using PhraseQuery with explicitly set term positions and slop=0, in
> > order to skip stop words. The field in my index is indexed with
> TermVector
> > positions.
> >
> > When I do a query with stop words skipped, for example "internet for
> > research" (translated into PhraseQuery: "internet ? research"), I am
> > getting
> > results with non-stop words as well as stop words, where the stop word
> > should be (e.g. "internet related research").
> >
> > Is this expected behavior? If so, is there any way to do what I want,
> which
> > is for the query to match only results like "internet [stop-word]
> > research"?
> >
> > Thanks,
> > -- Avi
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message