lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eloi Rocha Neto" <eloi.ro...@gmail.com>
Subject Re: How to make a case insensitive search using a FuzzyQuery?
Date Fri, 06 Jul 2007 16:57:35 GMT
Hi Erick, Jiye,

  Thanks for your help!

  My index is too short (less then 2MB). So I am not worry about it! I will
index it twice!

  Thanks again!

[]s

Eloi

On 7/6/07, Erick Erickson <erickerickson@gmail.com> wrote:
>
> I flat guarantee that if you try to search on fields that are indexed
> mixed case, you'll have no end of grief <G>. Everything from
> mis-typed search requests to the same word being cased
> differently in different parts of the source to ......
>
> Your idea to index it twice is actually a solution that is often used.
> Perhaps it'll relieve your nose to consider your storage and index
> options <G>...
>
> For example, index the field as Field.Store.NO, Field.Index.TOKENIZED
> and search on that one.
>
> For display, index as Field.Store.YES, Field.Index.NO
>
> Even though you're breaking these into two fields, I don't think
> your index size changes much. I know that was the hurdle I
> had to get past to get comfortable with this....
>
> BTW, how big do you expect your index to get anyway? It''s
> one thing to be concerned about size if your index is many
> gigabytes, but it's a needless worry if your index is under, say,
> a gigabyte or so.
>
> Best
> Erick
>
> On 7/6/07, Eloi Rocha Neto <eloi.rocha@gmail.com> wrote:
> >
> > Hi Daniel,
> >
> >    I dont lowercase the field at index time, because I have to show the
> > results in the same way as it was found.
> >
> >    For instance:
> >
> >      Some fields indexed:
> >
> >       PP-Trip SubAlcance Seq Negativa
> >       PP-Trip SubAlcance Seq Positiva
> >       PS-Trip SubAlcance Seq Negativa
> >       PS-Trip SubAlcance Seq Positiva
> >
> >     If I search for "PP-TRIP SUBALCANCE SEQ NEG", I want that the result
> > showed are:
> >        PP-Trip SubAlcance Seq Negativa
> >        PS-Trip SubAlcance Seq Negativa
> >
> >     Not:
> >        pp-trip subalcance seq negativa
> >        ps-trip subalcance seq negativa
> >
> >   A possible solution is store in a document object two fields: the
> > original
> > and the lowercased. I use the last one to make the query, and the other
> > one
> > to show the results. It works, but it doesnt smell good!
> >
> >   Thanks for your help!
> >
> > Eloi
> >
> >
> > On 7/6/07, Daniel Noll <daniel@nuix.com> wrote:
> > >
> > > On Friday 06 July 2007 11:39:00 Eloi Rocha Neto wrote:
> > > > Hi,
> > > >
> > > >    Anyone knows how to make a case insensitive search using a
> > > FuzzyQuery?
> > > >
> > > >    I want that the results coming from "PP-Trip SubAlcance Seq
> > > Negativa",
> > > > "pp-trip subAlcance seq negativa" and "PP-TRIP SUBALCANCE SEQ
> > NEGATIVA"
> > > be
> > > > the same. The field must be indexed by "PP-Trip SubAlcance Seq
> > > Negativa".
> > > >
> > > >    My code:
> > > >       Query query = new FuzzyQuery( new Term( field, input ) ,
> > > similarity
> > > > ); Hits hits = indexSearcher.search(query);
> > > >
> > > >   I really appreciate any help!
> > >
> > > Why don't you just have your analyser lowercase the field at indexing
> > > time?  I
> > > don't see why you would use a FuzzyQuery for something where a normal
> > > PhraseQuery should suffice.
> > >
> > > Daniel
> > >
> > >
> > > --
> > > Daniel Noll
> > > Nuix Pty Ltd
> > > Suite 79, 89 Jones St, Ultimo NSW 2007, Australia    Ph: +61 2 9280
> 0699
> > > Web: http://nuix.com/                               Fax: +61 2 9212
> 6902
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message