lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <ian....@gmail.com>
Subject Re: question about indexing/searching using standardanalyzer for KEYWORD field that contains alphanumeric data
Date Mon, 03 Aug 2009 09:21:07 GMT
Hi


Storing documentkey as TEXT will be causing it to be passed through
StandardAnalyzer which will be downcasing it, and the index will be
holding "l2222fahbhmf" rather than "L2222FAHBHMF".  When you changed
it to KEYWORD it will have been stored as is so the
updateDocument(term, doc) call will have worked but searching will
have failed because StandardAnalyzer will have downcased it. Numeric
keys will have worked everywhere because they don't get downcased.

See "Why is it important to use the same analyzer type during indexing
and search?" in the FAQ.

The best solution is probably to store it as KEYWORD and use
PerFieldAnalyzerWrapper to specify KeywordAnalyzer for documentkey.
The javadocs have an example showing what you need.


TEXT and KEYWORD haven't been around in lucene for a while.  You might
like to consider upgrading.  Good practice anyway to mention what
version you are using when asking questions.


--
Ian.


On Mon, Aug 3, 2009 at 3:49 AM, Leonard
Gestrin<Leonard.Gestrin@markettools.com> wrote:
>
> Hello,
> I have question about KEYWORD type and searching/updating.  I am getting strange behavior
that I can't quite comprehend.
> My index is created using standard analyzer, which used for writing and searching. It
has three fields
>
> userpin - alphanumeric field which is stored as TEXT
> documentkey  - alphanumeric field which is stored as TEXT
> contents - text of document which is stored as TEXT
>
> When I try to update document I am creating Term to find document by documentKey and
I am using
>
>  org.apache.lucene.index.IndexWriter.updateDocument(term, pDocument);
>
> to do the update.  Lucene fails to find the document by the term and I am getting duplicate
documents in the index.
> When I changed index to define documentKey as KEYWORD the updates started to work fine.
> However, search for documentKey using StandardAnalyzer stopped working.
>
> It appears that lucene is using keywordAnalyzer for searching for the term during update,
even though the indexer is open with StandardAnalyzer.
>
> The sample values that are stored in documentKeys are: "L2222FAHBHMF", "L2222FAHBHAS".
> I noticed if documentKey is numeric value, both KeywordAnalyzer and StandardAnalyzer
can find the documents by it without any problem thus reader can find and indexer can update
without any problems. With alphanumeric I cant get both to work.
> Any help is appreciated.
> Thanks
> Leonard
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message