lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Exact match query on a field in index which has been indexed using StandardAnalyzer
Date Wed, 14 May 2008 15:30:18 GMT
Keeping a duplicate field is certainly one way to go, and assuming that
it's just the title the duplicate field probably won't increase your
index much. I'd recommend just giving it a try, it probably costs less
resource wise than you think.

You'll have to do something like this to get things to work. Another
possibility would be to introduce marker tokens in your field, index
something like "$ member of technical staff $" and then, when
querying for exact matches, *add* the $ tokens to the beginning
and end of the query.

But I'd go with the duplicate UN_TOKENIZED field first

Best
Erick

On Wed, May 14, 2008 at 1:14 AM, Gauri Shankar <gshankar.sahu@gmail.com>
wrote:

> Thanks Erick...
>
> My index size is  ~2 GB. Is it a good idea to keep another duplicate field
> as UN_TOKENIZED and search using KeywordAnalyzer?
>
> Few points:
> 1. when I say exact match then I mean the exact phrase match only. That
> implies the query should not match a document with the field value "Member
> of Technical Staff for Accounting".
>
> 2. stopwords are fine. So "Member Technical Staff" would be equivalent to
>  my query, and is acceptable.
>
> Thanks,
> Gauri
>
> On Tue, May 13, 2008 at 7:31 PM, Erick Erickson <erickerickson@gmail.com>
> wrote:
>
> > First, why do you OR together the different cases? Assuming
> > you're pushing your query through StandardAnalyzer, it'll lowercase
> > for you (just as it did during indexing).
> >
> > But to your question. Would you expect your query to match a document
> > with the field value "Member of Technical Staff for Accounting"? because
> > your phrase would match this as well.......
> >
> > Not to mention stop words. "Member Technical Staff" would be equivalent
> to
> > your query, is that acceptable?
> >
> > Probably the simplest way would be to index the field UN_TOKENIZED too
> > (perhaps normalizing it by, say, lower casing it and taking out stop
> > words)
> > and search against the duplicate field with KeywordAnalyzer. Yes it'll
> > cost
> > you some space but disk space is cheap <G>. How big is your index anyway?
> >
> > I suppose you could also STORE the field and take any documents returned,
> > get the field and compare, but then stop words, casing, etc are tricky.
> >
> > What is the use case you're trying to support? There may be other ways
> > to accomplish your goal than messing around with exact matches.
> >
> > On Tue, May 13, 2008 at 9:43 AM, Gauri Shankar <gshankar.sahu@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > I have a field in index which has been indexed using StandardAnalyzer
> > and
> > > as
> > > TOKENIZED. Now I would like to write a query which returns the hit if
> > > there
> > > is a exact match on the field value.
> > >
> > > Say, if field value is : Member of Technical Staff
> > > then "member of technical staff" OR "Member of Technical Staff" should
> > > returns this document. Note that I am writing the phrase query for
> > > matching
> > > the query string.
> > >
> > >
> > >
> > > --
> > > Regards,
> > > Gauri Shankar
> > >
> >
>
>
>
> --
> Regards,
> Gauri Shankar
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message