lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Griffiths <>
Subject RE: Require some advice
Date Thu, 12 Aug 2010 19:27:49 GMT
Solr is a search engine, not an entity extraction tool. 

While there are some decent open source entity extraction tools, they are focused on processing
sentences and paragraphs. The structural differences in text messages means you'd need to
do a fair amount of work to get decent entity extraction.

That said, you may want to look into simple word/phrase matching if your domain is sufficiently
small. Use RegEx to extract ZIP, use dictionaries to extract city/area, skills, and names.
Much simpler and cheaper. 

-----Original Message-----
From: Pavan Gupta [] 
Sent: Thursday, August 12, 2010 2:58 PM
Subject: Require some advice

I am new to text search and mining and have been doing research for different available products.
My application requires reading a SMS message
(unstructured) and finding out entities such as person name, area, zip , city and skills associated
with the person. SMS would be in form of free text. The parsed data would be stored in database
and used by Solr to display results.
A SMS message could in the following form:
"John Mayer Mumbai 411004 Juhu, car driver, also capable of body guard"
We need to interpret in the following manner:
first name -> John
last name -> Mayer
city-> Mumbai
zip -> 411004
skills -> car driver, body guard

1. Is Solr capable enough to handle this application considering that SMS message would be
2. How is Solr/Lucene as compared to other tools such as UIMA, GATE, CER (stanford university),
3. Is Solr only text search or can be used for information extraction?
4. Is it recommended to use Solr with other products such as UIMA and GATE.

There are companies that are specialized in making meaning out of unstructured SMS messages.
Do we have something similar in open source world? Can we extend Solr for the same purpose?

You reply would be appreciated.
Thanking you.

View raw message