lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: i18n numbers
Date Fri, 27 Mar 2009 14:15:41 GMT
this is really no problem at all... use RBBI to identify runs of numbers in
your query string, and then replace them with the normalized version. you
will need icu jar for this.

String userQuery = "Potter 19,99";
        Locale locale = new Locale("nl");
        RuleBasedBreakIterator bi = (RuleBasedBreakIterator)
RuleBasedBreakIterator.getWordInstance(locale);
        NumberFormat nf = NumberFormat.getNumberInstance(locale);
        bi.setText(userQuery);
        int start = bi.first(); int end = bi.next();
        StringBuilder normalizedQuery = new StringBuilder();
        while (end != BreakIterator.DONE) { // if its a number parse it and
append it formatted to my locale
            if (bi.getRuleStatus() == RuleBasedBreakIterator.WORD_NUMBER) {
                normalizedQuery.append(nf.parse(userQuery.substring(start,
end)));
            } else {
                normalizedQuery.append(userQuery.substring(start, end));
            }
            start = end;
            end = bi.next();
        }

after this code:

System.out.println(userQuery);
Potter 19,99
System.out.println(normalizedQuery);
Potter 19.99

On Fri, Mar 27, 2009 at 2:54 AM, Marcel Overdijk
<marceloverdijk@gmail.com>wrote:

>
> That would make sense yes.
>
> But the problem is I'm having a general query filed. I don't know user
> entered String or a number, or what he meant... Is 2008 a year (number) or
> part of an address String e.g. keeping the address.
> Or maybe he's combining stuff like "Potter 19,99"
>
>
>
>
> Robert Muir wrote:
> >
> > marcel,
> >
> > I'd suggest parsing/display numbers in a locale-sensitive way with
> > NumberFormat (be sure to supply correct locale)... and keeping them in
> the
> > index one consistent way (i.e. 19.99)
> >
> >
> >
> > On Thu, Mar 26, 2009 at 6:03 PM, Marcel Overdijk
> > <marceloverdijk@gmail.com>wrote:
> >
> >>
> >> Thanks for your reply.
> >>
> >> It's indeed a webapp with a html front-end.
> >> I agree letting end-user enter a Lucene query might not what you want.
> >>
> >> Probably I will be using an "all" index which indexes all fields of my
> >> entity. So in the book example including book title, isbn, price,
> >> author.firstname, author.lastname.
> >>
> >> The end-user will have an Quick Search option in which he/she can enter
> a
> >> query string.
> >> E.g. "Potter" when searching for Harry Potter books or "19,99" / "19.99"
> >> for
> >> books with a price of 19.99.
> >> So I actually don't know for what field the user is searching.
> >>
> >> This is also my use case to introduce Lucene/Hibernate Search.
> >> I don't want multiple like's in a SQL query.
> >>
> >>
> >> Cheers,
> >> Marcel
> >>
> >>
> >> Erick Erickson wrote:
> >> >
> >> > What does the front end look like? Is it a web page or a custom app?
> >> And
> >> > do you expect your users to actually enter the field name? I'd be
> >> > reluctant
> >> > to allow any but the geekiest of users to enter the Lucene syntax
> (i.e.
> >> > the
> >> > field
> >> > names). Users shouldn't know anything about the underlying structure.
> >> Not
> >> > to mention the headaches if you ever want to change it.
> >> >
> >> > So, let's assume an HTML page. *You* know what the underlying field
> >> > is no matter what the label on the entry field, so you should be able
> >> > to construct the query with the proper field names.
> >> >
> >> > Or I don't understand your problem at all, which is not unusual <G>..
> >> >
> >> > Best
> >> > Erick
> >> >
> >> > On Thu, Mar 26, 2009 at 5:32 PM, Marcel Overdijk
> >> > <marceloverdijk@gmail.com>wrote:
> >> >
> >> >>
> >> >> First of all I'm new into Lucene. I'm experimenting right now with
it
> >> in
> >> >> combination with Hibernate Search.
> >> >>
> >> >> What I'm wondering is of I can index numbers related to i18n.
> >> >>
> >> >> E.g. I have a Book entity with a price attribute.
> >> >> A book with a price of 19.99 can be found while searching for
> >> >> price:19.99.
> >> >>
> >> >> The thing is Dutch users will search for 19,99 (different decimal
> >> >> symbol).
> >> >> How can this be handled.
> >> >>
> >> >> Furthermore, Dutch users will search for something like prijs:19,99.
> >> >> Can this be done with aliases or something. The problem is maybe one
> >> day
> >> >> I
> >> >> want to support German language as well.
> >> >> The front-end app can be translated by simply adding i18n resource
> >> >> bundles.
> >> >> Is something like this also possible for searching within Lucene?
> >> >>
> >> >>
> >> >> Cheers,
> >> >> Marcel
> >> >> --
> >> >> View this message in context:
> >> >> http://www.nabble.com/i18n-numbers-tp22731528p22731528.html
> >> >> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> >> >>
> >> >>
> >> >> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >> >>
> >> >>
> >> >
> >> >
> >>
> >> --
> >> View this message in context:
> >> http://www.nabble.com/i18n-numbers-tp22731528p22732038.html
> >> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >
> >
> > --
> > Robert Muir
> > rcmuir@gmail.com
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/i18n-numbers-tp22731528p22736807.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
Robert Muir
rcmuir@gmail.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message