lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vasudevan Comandur <vcoman...@gmail.com>
Subject Re: Using Lucene for user query parsing
Date Fri, 06 Mar 2009 09:29:50 GMT
You could have single index file with all the names tagged at the time of
indexing. For the query parsing, you could have a lookup
 for common words ending which identify the business names (like Corp, Inc,
LLC, Ltd, etc.) and common words like (road, avenue,
street, lane etc) for address and separate the query terms at appropriate
places.

Another suggestion is to go with OpenNLP components and use POS tagger, NP
Chunker etc. which will better results during query parsing.

Regards
 Vasu

On Fri, Mar 6, 2009 at 11:55 AM, Srinivas Bharghav <srini.bharghav@gmail.com
> wrote:

> I am trying to evaluate as to whether Lucene is the right candidate for the
> problem at hand.
>
> Say I have 3 indexes:
>
> Index 1 has street names.
> Index 2 has business names.
> Index 3 has area names.
>
> All these names can be single words or a combination of words like woodward
> street or marks and spencers street etc etc.
>
> Now the use enters a query saying "mc donalds woodward street kingston
> precinct".
>
> I have to parse this query and come up with the best match possible. The
> problem is, in the query I do not know which part is the business name or
> area name or street name. Also the user may give the query in any order for
> example he may give it as "kingston precinct mc donalds woodward street".
> There might be spelling mistkaes in the query enterd by the user. Also he
> might use road for street or lane for street and such things. I know that
> Lucene is the right candidate for the synonym and spelling mistakes part
> but
> am a bit hazy regarding the user query parsing part as to in which index to
> search what. Any help is greatly appreciated.
>
> Thanks,
> Srini.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message