Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 5532 invoked from network); 29 Nov 2009 18:35:04 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 29 Nov 2009 18:35:04 -0000 Received: (qmail 85395 invoked by uid 500); 29 Nov 2009 18:35:02 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 85298 invoked by uid 500); 29 Nov 2009 18:35:01 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 85288 invoked by uid 99); 29 Nov 2009 18:35:01 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 29 Nov 2009 18:35:01 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of erickerickson@gmail.com designates 209.85.219.217 as permitted sender) Received: from [209.85.219.217] (HELO mail-ew0-f217.google.com) (209.85.219.217) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 29 Nov 2009 18:34:51 +0000 Received: by ewy9 with SMTP id 9so435499ewy.11 for ; Sun, 29 Nov 2009 10:34:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=0cUaIDlrbsUfbfL0s07P8zAKjx5QsNdP8HqbjHQ4Vd0=; b=lelNHya0gswf+hKVn5faVq8BP73x4t7GwAZslr9aW4Ua+P9/zAqOYHyDVuq8h8hNXV PPIDKdrJV9n/UfLycQ8Koij7ibN3KUr82PfAaM+bNNU+CYaCsa0iu8Jx6vbTcFUpx1sr yHkV1OpN8Sm116ouXk5Pr0myMJxPT73Z+hYes= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=Kpyr6QPpysUr1rzxl3+e1kYkEDuhg3CoZc398LvX3i1KGikn/MG0q6z59WvPyacfif BaVr0gO2YN1WkxlLzgbTuUDIG9eSXwHkGxBLIQIh1lrpOlcPa2m3lK0/l6uJvrwxuF83 ZV54AvDg5zQ9mvCCH47l7wQ0eb/skxO+WiiS0= MIME-Version: 1.0 Received: by 10.216.88.6 with SMTP id z6mr1183244wee.52.1259519671191; Sun, 29 Nov 2009 10:34:31 -0800 (PST) In-Reply-To: <4B1198A3.2030106@gmx.de> References: <4B0D6FEC.7080508@gmx.de> <359a92830911251011s3fd4df94ye432e482e08bc22e@mail.gmail.com> <8c4e68610911251208q12a77a47wbe89291d8ba10025@mail.gmail.com> <4B1198A3.2030106@gmx.de> Date: Sun, 29 Nov 2009 13:34:31 -0500 Message-ID: <359a92830911291034v30055082mca31bbbba0a97a6a@mail.gmail.com> Subject: Re: Problem with a "." for searching Lucene 2.4.0 From: Erick Erickson To: java-user@lucene.apache.org, info@soebes.de Content-Type: multipart/alternative; boundary=0016e6d9746e96cbd8047986c620 X-Virus-Checked: Checked by ClamAV on apache.org --0016e6d9746e96cbd8047986c620 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable See below On Sat, Nov 28, 2009 at 4:39 PM, Karl Heinz Marbaise wro= te: > Hi Ian, > > many thanks for the hints...based on your and Ericks hints i have taken a > deeper look into that...and the StandardAnalyzer which I'm using will > removed informations like "." and "-" from my queries > (+filename:testEXCEL-formats.xls) ... > > Here's the first issue. I wouldn't use StandardAnalyzer here at all. You're taking an analyzer that's not intended to handle file names (actually, it's intended to try to preserve emails, etc) and then having to compensate for it's actions in your queryparser. PerFieldAnalyzerWrapper can be used both at index and query time to parse different fields with different analyzers. Rather, I'd create my own analyzer from the tokenizers and tokenfilters Lucene provides that do what I want. Say a LowerCaseFilter and WhiteSpaceAnalyzer or something. Use that analyzer for indexing and querying... > > In addition to Erick's advice, since you are storing filename without >> analysis you could use a TermQuery to find it. >> > Does this mean i don't need to index the filename ? > > Indexing and storing are orthogonal. That is, if you want to search on something, you MUST index it. Storing it is simply putting an un-analyzed copy in your Document so you can easily display the original data. > > > You can use > >> BooleanQuery to combine that with other queries, including those >> generated by QueryParser. >> >> Based on those advices i have made an implementation which modifies my > CustomerQueryParser: > > Rather than do this, I'd re-use a custom analyzer (see above, and assuming that you can't use one of the standard analyzers) and just escape the relevant characters before feeding them to the query parser. The Lucene Wiki has a list of characters that need escaping I'm pretty sure. But see QueryParser.escape.... > protected Query getFieldQuery(String field, String term) throws > ParseException { > LOGGER.debug("getFieldQuery(): field:" + field + " Term: " + term) > if (FieldNames.REVISION.getValue().equals(field)) { > int revision =3D Integer.parseInt(term); > term =3D NumberUtils.pad(revision); > } > > if (FieldNames.FILENAME.getValue().equals(field)) { > Term t =3D new Term(FieldNames.FILENAME.getValue(), term.toLowerCase()= ); > TermQuery tq =3D new TermQuery (t); > BooleanQuery bq =3D new BooleanQuery (); > bq.add(tq, Occur.MUST); > return bq; > } > return super.getFieldQuery(field, term); > } > > Based on my Unit Tests it works as expected... > > But I'm not sure to understand the things like "queryparts -filename:*.xl= s" > correct.. > > If you can use analyzers as above, you'll save yourself a lot of work by letting Lucene do the heavy lifting ... Best Erick > Doesn't that mean that my implementation will change the behaviour into t= he > following: > > "queryparts +filename:*.xls" or did i misunderstand things here ? > > > Thanks for your help... > > > Kind regards > Karl Heinz Marbaise > -- > SoftwareEntwicklung Beratung Schulung Tel.: +49 (0) 2405 / 415 893 > Dipl.Ing.(FH) Karl Heinz Marbaise ICQ#: 135949029 > Hauptstrasse 177 USt.IdNr: DE191347579 > 52146 W=FCrselen http://www.soebes.de > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --0016e6d9746e96cbd8047986c620--