lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Using Lucene for user query parsing
Date Mon, 09 Mar 2009 12:43:20 GMT
Sure, Lucene is suited. If....

The central problem here isn't the search engine, IMO, it's
figuring out what bits of the query are relevant to what
parts of the data. That is, in some random string, what is
the street, business name, address, etc.

Lucene has nothing built in that I know of that'll help with
that part. Once you *have* figured out what parts of the
query relate to what fields in your index, the rest is easy.
But you'll have to do the figuring out yourself.

But you might try the bagowords I suggested before as
a shortcut and see what kind of results you get. Sometimes
simplistic solutions are "good enough", but that's always
up to you to decide once you start seeing results.

Best
Erick

On Mon, Mar 9, 2009 at 4:31 AM, Srinivas Bharghav
<srini.bharghav@gmail.com>wrote:

> Thanks for all the inputs guys.
>
> As Erick said let me elaborate the problem a bit.
>
> We are trying to develop a local search application. The user will be able
> to locate businesses, localities and roads. We have data for all the 3 with
> us. We do not want to provide separate boxes for the user to enter data i.e
> a common one for all entry box (a la google :)) where the user enters an
> address (or road name or area name) or all the 3 etc etc. From the user
> query we have to find the best possible match in our data. The data has
> lots
> of numbers as well as names with initials and stuff like that. The user may
> enter the names with a space between the initals or they might club the
> initials together etc etc. From the user query we do not have a way to
> figure out what is what apart from the obvious ones as to if something ends
> with a road then it is a road name or if there is a layout in the query
> then
> it is an area etc. Right now we have our own custom framework. I am trying
> to figure out as to whether Lucene is suited for this kind of application.
>
> Once again thanks for all the inputs.
>
> On Fri, Mar 6, 2009 at 7:15 PM, Erick Erickson <erickerickson@gmail.com
> >wrote:
>
> > Whatever you do will be wrong <G>. What you're saying is
> > that you have structured data that the user wants to search
> > in an unstructured way, and you want to try to create a
> > system that intuits what the user meant. Good luck <G>.
> >
> > Can you back up a bit and talk about the problem you're
> > trying to solve? If, for instance, you're trying to find the
> > best match for a particular business, one approach would
> > be to create one index where each business had
> >
> > street
> > business
> > area
> > bagowords
> >
> > where the field bagowords contained a copy of the data
> > from the other three fields, then search bagowords
> > for your query terms. It sounds simplistic, but it might be
> > surprisingly good.
> >
> > And if this is out in left field, a higher level statement
> > of the problem would help get better answers.
> >
> > Best
> > Erick
> >
> > On Fri, Mar 6, 2009 at 1:25 AM, Srinivas Bharghav
> > <srini.bharghav@gmail.com>wrote:
> >
> > > I am trying to evaluate as to whether Lucene is the right candidate for
> > the
> > > problem at hand.
> > >
> > > Say I have 3 indexes:
> > >
> > > Index 1 has street names.
> > > Index 2 has business names.
> > > Index 3 has area names.
> > >
> > > All these names can be single words or a combination of words like
> > woodward
> > > street or marks and spencers street etc etc.
> > >
> > > Now the use enters a query saying "mc donalds woodward street kingston
> > > precinct".
> > >
> > > I have to parse this query and come up with the best match possible.
> The
> > > problem is, in the query I do not know which part is the business name
> or
> > > area name or street name. Also the user may give the query in any order
> > for
> > > example he may give it as "kingston precinct mc donalds woodward
> street".
> > > There might be spelling mistkaes in the query enterd by the user. Also
> he
> > > might use road for street or lane for street and such things. I know
> that
> > > Lucene is the right candidate for the synonym and spelling mistakes
> part
> > > but
> > > am a bit hazy regarding the user query parsing part as to in which
> index
> > to
> > > search what. Any help is greatly appreciated.
> > >
> > > Thanks,
> > > Srini.
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message