lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Problem with a "." for searching Lucene 2.4.0
Date Sun, 29 Nov 2009 18:34:31 GMT
See below

On Sat, Nov 28, 2009 at 4:39 PM, Karl Heinz Marbaise <khmarbaise@gmx.de>wrote:

> Hi Ian,
>
> many thanks for the hints...based on your and Ericks hints i have taken a
> deeper look into that...and the StandardAnalyzer which I'm using will
> removed informations like "." and "-" from my queries
> (+filename:testEXCEL-formats.xls) ...
>
>
Here's the first issue. I wouldn't use StandardAnalyzer here at all. You're
taking an analyzer
that's not intended to handle file names (actually, it's intended to try to
preserve
emails, etc) and then having to compensate for it's actions in your
queryparser.
PerFieldAnalyzerWrapper can be used both at index and query time to parse
different fields with different analyzers.

Rather, I'd create my own analyzer from the tokenizers and tokenfilters
Lucene
provides that do what I want. Say a LowerCaseFilter and WhiteSpaceAnalyzer
or something. Use that analyzer for indexing and querying...


>
>  In addition to Erick's advice, since you are storing filename without
>> analysis you could use a TermQuery to find it.
>>
> Does this mean i don't need to index the filename ?
>
>
Indexing and storing are orthogonal. That is, if you want to search
on something, you MUST index it. Storing it is simply putting an
un-analyzed copy in your Document so you can easily display
the original data.


>
> > You can use
>
>> BooleanQuery to combine that with other queries, including those
>> generated by QueryParser.
>>
>>  Based on those advices i have made an implementation which modifies my
> CustomerQueryParser:
>
>
Rather than do this, I'd re-use a custom analyzer (see above, and assuming
that you
can't use one of the standard analyzers) and just escape the relevant
characters
before feeding them to the query parser. The Lucene Wiki has a list of
characters
that need escaping I'm pretty sure. But see QueryParser.escape....


> protected Query getFieldQuery(String field, String term) throws
> ParseException {
>  LOGGER.debug("getFieldQuery(): field:" + field + " Term: " + term)
>  if (FieldNames.REVISION.getValue().equals(field)) {
>        int revision = Integer.parseInt(term);
>        term = NumberUtils.pad(revision);
>  }
>
>  if (FieldNames.FILENAME.getValue().equals(field)) {
>    Term t = new Term(FieldNames.FILENAME.getValue(), term.toLowerCase());
>    TermQuery tq = new TermQuery (t);
>    BooleanQuery bq = new BooleanQuery ();
>    bq.add(tq, Occur.MUST);
>    return bq;
>  }
> return super.getFieldQuery(field, term);
> }
>
> Based on my Unit Tests it works as expected...
>
> But I'm not sure to understand the things like "queryparts -filename:*.xls"
> correct..
>
>
If you can use analyzers as above, you'll save yourself a lot of work by
letting Lucene
do the heavy lifting <G>...

Best
Erick


> Doesn't that mean that my implementation will change the behaviour into the
> following:
>
> "queryparts +filename:*.xls" or did i misunderstand things here ?
>
>
> Thanks for your help...
>
>
> Kind regards
> Karl Heinz Marbaise
> --
> SoftwareEntwicklung Beratung Schulung    Tel.: +49 (0) 2405 / 415 893
> Dipl.Ing.(FH) Karl Heinz Marbaise        ICQ#: 135949029
> Hauptstrasse 177                         USt.IdNr: DE191347579
> 52146 W├╝rselen                           http://www.soebes.de
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message