lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: Indexing product keys with and without spaces in them
Date Tue, 03 Jan 2012 14:09:41 GMT
Hi,

> Has somebody ever tried something like this? Is there a way to do this
without
> increasing the index to about 15 times (1+2+3+4+5) its original size?

The index will not have 15 times the size as it is inverted index and only
indexes the unique parts of your tokens. In most cases it will have approx.
maybe the double size. Just try it out, depends on your data!

Uwe

> Christoph
> 
> 
> Am 03.01.2012 11:06, schrieb Ian Lea:
> > When indexing you could normalise them down to a standard format
> > without spaces or hyphens, but searching is much harder if you really
> > can't identify possible product ids within user queries.  Make
> > triplets without spaces or hyphens?  "CRX USB-2.0 16GB" ==>
> > CRXUSB2.016GB but also "some random words" ==>  somerandomwords.
> The
> > latter wouldn't match, the former would if it was a valid id.
> >
> > Some form of synonym analysis/injection at indexing would be better if
> > you could do that: CRXUSB2.016GB ==>  "CRX USB2.0 16GB", to be indexed
> > as well as the base value.
> >
> > If you can't either have a dedicated product id search field or
> > standardise the product ids, this is going to be hard.
> >
> >
> > --
> > Ian,
> >
> >
> > On Tue, Jan 3, 2012 at 8:44 AM, Christoph Kaser<lucene_list@iconparc.de>
> wrote:
> >> Hello,
> >>
> >> we use lucene as search engine in an online shop. The products in
> >> this shop often contain product keys like CRXUSB2.0-16GB.
> >> We would like our customers to be able to find products by entering
> >> their key. The problem is that product keys sometimes contain spaces
> >> or dashes and customers sometimes don't enter these whitespaces
> >> correctly. On the other hand, some customers enter whitespaces where
> >> there are none. Is there an analyzer or some other method that allows
> >> us to find the product if the user enters things like:
> >> - "CRX USB2.0 16GB"
> >> - "CRXUSB2.016GB"
> >> - "CRX USB-2.0 16GB"
> >> ...
> >>
> >> The problem is that the product keys don't all have a common format
> >> and are contained in the normal text, so we don't have an easy way to
> >> treat them different to the rest of the text.
> >>
> >> Any help would be great!
> >>
> >> Best regards,
> >> Christoph
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> 
> 
> --
> Dipl.-Inf. Christoph Kaser
> 
> IconParc GmbH
> Sophienstrasse 1
> 80333 München
> 
> www.iconparc.de
> 
> Tel +49 -89- 15 90 06 - 21
> Fax +49 -89- 15 90 06 - 49
> 
> Geschäftsleitung: Dipl.-Ing. Roland Brückner, Dipl.-Inf. Sven Angerer. HRB
> 121830, Amtsgericht München
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message