Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E094D9109 for ; Tue, 3 Jan 2012 14:06:16 +0000 (UTC) Received: (qmail 46249 invoked by uid 500); 3 Jan 2012 14:06:14 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 46194 invoked by uid 500); 3 Jan 2012 14:06:14 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 46186 invoked by uid 99); 3 Jan 2012 14:06:14 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Jan 2012 14:06:14 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of uwe@thetaphi.de designates 188.138.97.18 as permitted sender) Received: from [188.138.97.18] (HELO mail.sd-datasolutions.de) (188.138.97.18) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Jan 2012 14:06:08 +0000 Received: from VEGA (port-92-196-46-49.dynamic.qsc.de [92.196.46.49]) by mail.sd-datasolutions.de (Postfix) with ESMTPSA id AC58E14AA34B for ; Tue, 3 Jan 2012 14:05:47 +0000 (UTC) From: "Uwe Schindler" To: References: <4F02BFEB.8090105@iconparc.de> <4F03011A.6040302@iconparc.de> In-Reply-To: <4F03011A.6040302@iconparc.de> Subject: RE: Indexing product keys with and without spaces in them Date: Tue, 3 Jan 2012 15:06:54 +0100 Message-ID: <006c01ccca20$f3298310$d97c8930$@thetaphi.de> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Outlook 14.0 Thread-Index: AQE7k0o4sijA0h/TSGT4d1RAm/GxAAGlgetVAigVrIWW/q1+IA== Content-Language: de X-Virus-Checked: Checked by ClamAV on apache.org Hi, In Solr there is WordDelimiterFilter (it's also available in Lucene trunk/4.0, in the analyzers-common module), that can handle these = product keys (split them up, keep them together, merge them). You can extract = the source code in 3.x and use it as own TokenFilter! But if the product = keys are not in separate fields, you cannot handle them if they contain = spaces. If you have one field called "productID" or whatever and you know that = the user wants to search for a product key, you can index/search with a = specific analyzer that uses WordDelimiterFilter, LowerCaseFilter and = KeywordTokenizer only for this field. Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: uwe@thetaphi.de > -----Original Message----- > From: Christoph Kaser [mailto:lucene_list@iconparc.de] > Sent: Tuesday, January 03, 2012 2:23 PM > To: java-user@lucene.apache.org > Subject: Re: Indexing product keys with and without spaces in them >=20 > Hi Aditya, >=20 > Thank you for your suggestion! > Unfortunately, this is not possible, as there is no common format for = all product > keys. The products are not ours nor are they all from the same manufacturer, > so we don't have any influence on how the product keys look like. >=20 > Regards, > Christoph >=20 > On 03.01.2012 14:10, findbestopensource wrote: > > Hi Christoph > > > > My opinion is, you should not normalize or do any modification to = the > > product keys. This should be unique. Should be used as it is. = Instead > > of spaces you should have only used "-" but since the product = already > > out in the market, it cannot help. > > > > In your UI, You could provide multiple text box where user will fill > > respective chars. You could add space or "-" before passing the key = to > > Lucene. > > > > Regards > > Aditya > > www.findbestopensource.com - Finds best open source across all platforms. > > > > > > On Tue, Jan 3, 2012 at 2:14 PM, Christoph > Kaserwrote: > > > >> Hello, > >> > >> we use lucene as search engine in an online shop. The products in > >> this shop often contain product keys like CRXUSB2.0-16GB. > >> We would like our customers to be able to find products by entering > >> their key. The problem is that product keys sometimes contain = spaces > >> or dashes and customers sometimes don't enter these whitespaces > >> correctly. On the other hand, some customers enter whitespaces = where > >> there are none. Is there an analyzer or some other method that = allows > >> us to find the product if the user enters things like: > >> - "CRX USB2.0 16GB" > >> - "CRXUSB2.016GB" > >> - "CRX USB-2.0 16GB" > >> ... > >> > >> The problem is that the product keys don't all have a common format > >> and are contained in the normal text, so we don't have an easy way = to > >> treat them different to the rest of the text. > >> > >> Any help would be great! > >> > >> Best regards, > >> Christoph > >> > >> > >> = ------------------------------**------------------------------**----- > >> ---- To unsubscribe, e-mail: > >> = java-user-unsubscribe@lucene.**apache.org >> e.apache.org> For additional commands, e-mail: > >> = java-user-help@lucene.apache.**org > >> > >> >=20 >=20 > -- > Dipl.-Inf. Christoph Kaser >=20 > IconParc GmbH > Sophienstrasse 1 > 80333 M=FCnchen >=20 > www.iconparc.de >=20 > Tel +49 -89- 15 90 06 - 21 > Fax +49 -89- 15 90 06 - 49 >=20 > Gesch=E4ftsleitung: Dipl.-Ing. Roland Br=FCckner, Dipl.-Inf. Sven = Angerer. HRB > 121830, Amtsgericht M=FCnchen >=20 >=20 >=20 >=20 > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org