lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: How to make a case insensitive search using a FuzzyQuery?
Date Fri, 06 Jul 2007 16:03:19 GMT
I flat guarantee that if you try to search on fields that are indexed
mixed case, you'll have no end of grief <G>. Everything from
mis-typed search requests to the same word being cased
differently in different parts of the source to ......

Your idea to index it twice is actually a solution that is often used.
Perhaps it'll relieve your nose to consider your storage and index
options <G>...

For example, index the field as Field.Store.NO, Field.Index.TOKENIZED
and search on that one.

For display, index as Field.Store.YES, Field.Index.NO

Even though you're breaking these into two fields, I don't think
your index size changes much. I know that was the hurdle I
had to get past to get comfortable with this....

BTW, how big do you expect your index to get anyway? It''s
one thing to be concerned about size if your index is many
gigabytes, but it's a needless worry if your index is under, say,
a gigabyte or so.

Best
Erick

On 7/6/07, Eloi Rocha Neto <eloi.rocha@gmail.com> wrote:
>
> Hi Daniel,
>
>    I dont lowercase the field at index time, because I have to show the
> results in the same way as it was found.
>
>    For instance:
>
>      Some fields indexed:
>
>       PP-Trip SubAlcance Seq Negativa
>       PP-Trip SubAlcance Seq Positiva
>       PS-Trip SubAlcance Seq Negativa
>       PS-Trip SubAlcance Seq Positiva
>
>     If I search for "PP-TRIP SUBALCANCE SEQ NEG", I want that the result
> showed are:
>        PP-Trip SubAlcance Seq Negativa
>        PS-Trip SubAlcance Seq Negativa
>
>     Not:
>        pp-trip subalcance seq negativa
>        ps-trip subalcance seq negativa
>
>   A possible solution is store in a document object two fields: the
> original
> and the lowercased. I use the last one to make the query, and the other
> one
> to show the results. It works, but it doesnt smell good!
>
>   Thanks for your help!
>
> Eloi
>
>
> On 7/6/07, Daniel Noll <daniel@nuix.com> wrote:
> >
> > On Friday 06 July 2007 11:39:00 Eloi Rocha Neto wrote:
> > > Hi,
> > >
> > >    Anyone knows how to make a case insensitive search using a
> > FuzzyQuery?
> > >
> > >    I want that the results coming from "PP-Trip SubAlcance Seq
> > Negativa",
> > > "pp-trip subAlcance seq negativa" and "PP-TRIP SUBALCANCE SEQ
> NEGATIVA"
> > be
> > > the same. The field must be indexed by "PP-Trip SubAlcance Seq
> > Negativa".
> > >
> > >    My code:
> > >       Query query = new FuzzyQuery( new Term( field, input ) ,
> > similarity
> > > ); Hits hits = indexSearcher.search(query);
> > >
> > >   I really appreciate any help!
> >
> > Why don't you just have your analyser lowercase the field at indexing
> > time?  I
> > don't see why you would use a FuzzyQuery for something where a normal
> > PhraseQuery should suffice.
> >
> > Daniel
> >
> >
> > --
> > Daniel Noll
> > Nuix Pty Ltd
> > Suite 79, 89 Jones St, Ultimo NSW 2007, Australia    Ph: +61 2 9280 0699
> > Web: http://nuix.com/                               Fax: +61 2 9212 6902
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message