lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: Alphanumeric model ids
Date Tue, 25 Apr 2006 23:02:32 GMT

I bet that if you look at the toString() of the query you get back from
your query parser, you'll see that the non numeric part numbers have been

You took the right steps when you indexed the field as UN_TOKENIZED, but
at query time your query parser doesn't know about that -- take a look at
the PerFieldAnalyzerWrapper and the KeywordAnalyzer as a way to make sure
your query parser doesn't do any processing on the terms you search for in
non-tockenized fields.

Or ... prepare for shameless plug ... check out the Solr project.  Solr
adds a very flexible schema layer on top of lucene, that lets you specify
field types, and map fields (explicitly named or dynamicaly created based
on field name patters) to field types -- every field type can have a
differnet analyzer, two acctually: one used when indexing and one used
when quering...

: Date: Tue, 25 Apr 2006 16:53:24 -0600
: From: Jeremy Hanna <>
: Reply-To:
: To:
: Subject: Alphanumeric model ids
: I am trying to search by a number of fields including an alphanumeric
: model id.
: This is just the model id that comes from manufacturers.  I've tried
: to use a StandardAnalyzer and a SnowballAnalyzer to index the data.
: Then I search with the associated analyzer using a
: MultiFieldQueryParser.  Going through the debug into the attached
: Lucene source, I see that all a MultiFieldQueryParser does is make a
: bunch of queries and link them together with a Boolean query with
: SHOULD values.  I see that it is getting the right field, "model",
: and has the right query in there, e.g. "XPHP", but it returns no
: results.
: When I index it, I do the following:
: modelField = new Field("model", (product.getModelNumber() == null) ?
: "" : product.getModelNumber(), Field.Store.NO,
: Field.Index.UN_TOKENIZED);
: ...
: document.add(modelField);
: ...
: indexWriter.addDocument(document);
: So it shouldn't be messing with the model id retrieved from the
: database when it puts it in the index (UN_TOKENIZED).
: The weird thing is that it finds those model ids that are only
: numeric (including punctuation, e.g. "40603-38").  But it cannot find
: the "XPHP" model id.  On the command line SQL interface, I can do a
: select * from product where model = 'XPHP'; and it comes back with
: the single result.
: Anyone have any idea as to why the numeric ones would come up and the
: alphanumeric ones would not find the right values in the index?
: Thanks much,
: Jeremy
: ---------------------------------------------------------------------
: To unsubscribe, e-mail:
: For additional commands, e-mail:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message