lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <>
Subject Re: Not able to search on UN_TOKENIZED fields
Date Thu, 05 Apr 2007 22:13:54 GMT
See below

On 4/5/07, Ryan O'Hara <> wrote:
> Hey Erick,
> Thanks for the quick response.  I need a truly exact match.  What I
> ended up doing was using a TOKENIZED field, but altering the
> StandardAnalyzer's stop word list to include only the word/letter
> 'a'.  Below is my searching code:

You're heading for a world of hurt here. I flat guarantee that you'll
spend a LOT of time tweaking this to get it right, because you're
not doing what you said you wanted to do, that is match exactly.
You said it yourself, you're "removing stop words" which is NOT exact

You need to make a clear statement of what you're trying to do
before too much longer. An example or two would help.

String[] stopWords = {"a"};
>              StandardAnalyzer sa = new StandardAnalyzer(stopWords);
>              QueryParser qp = new QueryParser("symbol", sa);
>              Query queryPhrase = qp.parse(symbol.toUpperCase());
>              Hits hits =;
>              String hit;
>              if(hits.length() > 0){
>                  hit = hits.doc(0).get("count");
>                  count = Integer.parseInt(hit);
>              }

You're uppercasing your search string. Did you index things
uppercased? Your QueryParser is trying to work on a field called
"symbol". Is that what you intend? I'm a bit confused when you
use the same word for the field and its value.

Is the reason it wasn't working due to the fact that I'm passing in a
> StandardAnalyzer?

I have no idea why it isn't working. You haven't provided the
query.toString() output. There's no evidence you've used
Luke to look at what you've indexed to know what to expect.
Without those two things, or a self-contained example I can only

I thought that maybe the searching mechanisms
> would be able to use or not use an analyzer according to what the
> field.index value is.

No, you can't do this. You can make a PerFieldAnalyzerWrapper to
process different *fields* with different analyzers, but not different
field *values*. Step back and consider that the purpose of an
Analyzer is to break up the token stream. Attaching
meaning to those tokens isn't part of an Analyzer's job. You
could create your own analyzer (see Lucene In Action) if you
wanted to do special operations on the tokens, but I really don't
think you need to go there.

One other question that you may have an answer to:  I'm eventually
> going to need to alter the stop word list to include all default stop
> words, except those that match certain criteria.  Can this be done?

Sure, this can be done. Again you've already done it with your
StandardAnalyzer constructor. But you better have *indexed*
the data with the same stop words removed or you'll be disappointed.
Again, you're doing something that directly contradicts
what you stated you are trying to do. So can you come up with
a better problem statement?

I'd also recommend that you use one of the other analyzers for a while
until you get a better feel for what Lucene is doing. StopAnalyzer comes
to mind.

> Ryan
> On Apr 5, 2007, at 3:08 PM, Erick Erickson wrote:
> > Yes, you can search on UN_TOKENIZED fields, but they're exact,
> > really, really exact <G>.
> >
> > I'd recommend that you get a copy of Luke (google lucene luke) and
> > examine your index to see what you actually have in your index.
> >
> > Also, you haven't provided us a clue what the actual query is. I'd
> > use Query.toString().
> >
> > I suspect that the query is getting tokenized if you're using one
> > of the normal Analyzers...
> >
> > Erick
> >
> > On 4/5/07, Ryan O'Hara <> wrote:
> >>
> >> Hey,
> >>
> >> I was just wondering if you are supposed to be able to search on
> >> UN_TOKENIZED fields?  It seems like you can from the docs, but I have
> >> been unsuccessful.  I want to do exact string matching on a certain
> >> field without analyzer interference.
> >>
> >> Thanks,
> >> Ryan
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail:
> >> For additional commands, e-mail:
> >>
> >>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message