lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: question about
Date Mon, 03 Aug 2009 13:20:40 GMT
When you construct a Term manually, no analyzers are applied, it'sconstructed
with whatever you put in there, just as you specify it. So,
indeed, it "looks" like a KeywordAnalyzer is being used, but in reality
no analysis is being done.

So what's happening is that when you index with StandardAnalyzer,
your tokens are getting lower-cased and, I assume, you construct
your Terim with upper case, so it's not found.

You probably want to use KeywordAnalyzer for your document IDs
as it's intended to pass through the input without change. Then, as you
see, you'll be able to find things.

See PerFieldAnalyzerWrapper for the way to use different
analyzers on different fields, both an index time and
search time, it might help.

See Luke (google Lucene Luke) for a wonderful tool that
allows you to look at your index and see what's actually
stored as well as what the effect of different analyzers is.

Best
Erick



On Sun, Aug 2, 2009 at 10:44 PM, Leonard Gestrin <
Leonard.Gestrin@markettools.com> wrote:

> Hello,
> I have question about KEYWORD type and searching/updating.  I am getting
> strange behavior that I can't quite comprehend.
> My index is created using standard analyzer, which used for writing and
> searching. It has three fields
>
> userpin - alphanumeric field which is stored as TEXT
> documentkey  - alphanumeric field which is stored as TEXT
> contents - text of document which is stored as TEXT
>
> When I try to update document I am creating Term to find document by
> documentKey and I am using
>
>  org.apache.lucene.index.IndexWriter.updateDocument(term, pDocument);
>
> to do the update.  Lucene fails to find the document by the term and I am
> getting duplicate documents in the index.
> When I changed index to define documentKey as KEYWORD the updates started
> to work fine.
> However, search for documentKey using StandardAnalyzer stopped working.
>
> It appears that lucene is using keywordAnalyzer for searching for the term
> during update, even though the indexer is open with StandardAnalyzer.
>
> The sample values that are stored in documentKeys are: "L2222FAHBHMF",
> "L2222FAHBHAS".
> I noticed if documentKey is numeric value, both KeywordAnalyzer and
> StandardAnalyzer can find the documents by it without any problem thus
> reader can find and indexer can update without any problems. With
> alphanumeric I cant get both to work.
> Any help is appreciated.
> Thanks
> Leonard
>
>
>
>
>
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message