lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wettin <karl.wet...@gmail.com>
Subject Re: Extracting contact data
Date Wed, 13 Jan 2010 17:04:03 GMT
Lucene will probably only be helpful if you know what you are looking  
for, e.g. that you search for a given person, a given street and given  
time intervals.

Is this what you want to do?

If you instead are looking for a way to really extract any person,  
street and time interval that a document is associated with you  
probably want to look for a natural language processing project that  
can do something like semantic part of speech tagging for you.


       karl

13 jan 2010 kl. 17.39 skrev Ortelli, Gian Luca:

> Hi community,
>
>
>
> I have a general understanding of Lucene concepts, and I'm wondering  
> if
> it's the right tool for my job:
>
>
>
> - I need to extract data like e.g. time intervals ("8am - 12pm"),  
> street
> addresses from a set of files. The common issue with this data unit is
> that they contain spaces and are not always definable through regexes.
>
>
>
> - the extraction must take into consideration the "proximity": for
> example, a mail address which is close to the work "Contacts" will
> receive a higher rank, since I'm looking for contact data.
>
>
>
> Do you think I can get any advantage from building a solution on  
> Lucene?
>
>
>
>  Gianluca
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message