lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lucifer Hammer" <luce...@gmail.com>
Subject Re: Searching on plurals and phrases in a single field
Date Thu, 13 Dec 2007 03:06:34 GMT
Hi Erick,

Thanks for the great idea, it's exactly the kind of suggestion I was looking
for!

Lucifer

On Dec 12, 2007 2:34 PM, Erick Erickson <erickerickson@gmail.com> wrote:

> I faced a very similar requirement and solved it by indexing multiple
> tokens at the same place. For instance, say you're indexing
> the word "foxes". Index something like fox$ and foxes at the same
> position (see SynonymAnalyzer in Lucene In Action for an example).
> You probably MUST index the multiple terms with an increment gap
> of 0 (more later).
>
> Example phrase "red foxes are plentiful"
>
> Now you have the capability of distinguishing between a stemmed
> and unstemmed version of the word and can search for exactly
> "red foxes". But if instead you want to search for the stemmed
> version, you can search for "red fox$". But "red fox" will NOT match.
>
> The reason you need to index these with an increment gap of 0 is so
> phrase queries work. If you let the gap increment for each token, and
> indexed a phrase like "red foxes are plentiful", then did a
> proximity search on "red plentiful"~2, it would fail because
> you'd have fox$ and foxes each taking up one position. But if
> fox$ and foxes both have the same position, it'll work.
>
> And it's all in the same index, one field, etc.
>
> Hope this helps
> Erick
>
> On Dec 12, 2007 1:25 PM, Lucifer Hammer <lucener@gmail.com> wrote:
>
> > Hi,
> >
> > We've got a requirement that we need to give our users  the ability to
> > search on exact phrases within a field, or, if they prefer, they can
> match
> > on plurals(either via stems, or another plural algorithm).  However, the
> > cases are mutually exclusive, for example given the following field in
> the
> > index:
> >
> > IndexField1: "The quick brown dog jumped over the lazy fox"
> >
> > If the user chooses an exact phrase search such as "lazy dog jumped",
> then
> > it'll match, however, if they also choose an exact phrase  and search
> for:
> > "lazy dogs" it shouldn't match.
> >
> > If the user chooses a plural search, then  both of the above searches
> > should
> > match.
> >
> > So... the question really is:  Can I do this all in one field, or will I
> > have to index the data twice, once in a field that has the exact text,
> and
> > in a second field in which I index the terms, and wordstack stems(or
> > plurals).
> >
> > If it's possible to do this in a single field, that would be much
> > preferred...
> >
> > Thanks for any help!
> > Lucifer
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message