lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: Indexing product keys with and without spaces in them
Date Tue, 03 Jan 2012 14:06:54 GMT
Hi,

In Solr there is WordDelimiterFilter (it's also available in Lucene
trunk/4.0, in the analyzers-common module), that can handle these product
keys (split them up, keep them together, merge them). You can extract the
source code in 3.x and use it as own TokenFilter! But if the product keys
are not in separate fields, you cannot handle them if they contain spaces.
If you have one field called "productID" or whatever and you know that the
user wants to search for a product key, you can index/search with a specific
analyzer that uses WordDelimiterFilter, LowerCaseFilter and KeywordTokenizer
only for this field.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Christoph Kaser [mailto:lucene_list@iconparc.de]
> Sent: Tuesday, January 03, 2012 2:23 PM
> To: java-user@lucene.apache.org
> Subject: Re: Indexing product keys with and without spaces in them
> 
> Hi Aditya,
> 
> Thank you for your suggestion!
> Unfortunately, this is not possible, as there is no common format for all
product
> keys. The products are not ours nor are they all from the same
manufacturer,
> so we don't have any influence on how the product keys look like.
> 
> Regards,
> Christoph
> 
> On 03.01.2012 14:10, findbestopensource wrote:
> > Hi Christoph
> >
> > My opinion is, you should not normalize or do any modification to the
> > product keys. This should be unique. Should be used as it is. Instead
> > of spaces you should have only used "-" but since the product already
> > out in the market, it cannot help.
> >
> > In your UI, You could provide multiple text box where user will fill
> > respective chars. You could add space or "-" before passing the key to
> > Lucene.
> >
> > Regards
> > Aditya
> > www.findbestopensource.com - Finds best open source across all
platforms.
> >
> >
> > On Tue, Jan 3, 2012 at 2:14 PM, Christoph
> Kaser<lucene_list@iconparc.de>wrote:
> >
> >> Hello,
> >>
> >> we use lucene as search engine in an online shop. The products in
> >> this shop often contain product keys like CRXUSB2.0-16GB.
> >> We would like our customers to be able to find products by entering
> >> their key. The problem is that product keys sometimes contain spaces
> >> or dashes and customers sometimes don't enter these whitespaces
> >> correctly. On the other hand, some customers enter whitespaces where
> >> there are none. Is there an analyzer or some other method that allows
> >> us to find the product if the user enters things like:
> >> - "CRX USB2.0 16GB"
> >> - "CRXUSB2.016GB"
> >> - "CRX USB-2.0 16GB"
> >> ...
> >>
> >> The problem is that the product keys don't all have a common format
> >> and are contained in the normal text, so we don't have an easy way to
> >> treat them different to the rest of the text.
> >>
> >> Any help would be great!
> >>
> >> Best regards,
> >> Christoph
> >>
> >>
> >> ------------------------------**------------------------------**-----
> >> ---- To unsubscribe, e-mail:
> >> java-user-unsubscribe@lucene.**apache.org<java-user-unsubscribe@lucen
> >> e.apache.org> For additional commands, e-mail:
> >> java-user-help@lucene.apache.**org<java-user-help@lucene.apache.org>
> >>
> >>
> 
> 
> --
> Dipl.-Inf. Christoph Kaser
> 
> IconParc GmbH
> Sophienstrasse 1
> 80333 München
> 
> www.iconparc.de
> 
> Tel +49 -89- 15 90 06 - 21
> Fax +49 -89- 15 90 06 - 49
> 
> Geschäftsleitung: Dipl.-Ing. Roland Brückner, Dipl.-Inf. Sven Angerer. HRB
> 121830, Amtsgericht München
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message