lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Ryley" <ja...@ryley.com>
Subject RE: advice on integrating NLP engine during indexing
Date Thu, 20 Dec 2007 16:08:52 GMT
Hi,

I can't answer your question -- sorry!  But, I was curious about the NLP you
describe.  Are there algorithms available for determining negation
automatically, and are they accurate?

Sincerely,
James

> -----Original Message-----
> From: 1world1love [mailto:jd_cowan@yahoo.com]
> Sent: Thursday, December 20, 2007 9:48 AM
> To: general@lucene.apache.org
> Subject: advice on integrating NLP engine during indexing
> 
> 
> Greetings all. I am new to Lucene and am looking for a little
> advice/direction/feedback on what I am trying to do. I want to index and
> query millions of documents that are unstructured and resemble
> crime/police/phsychiatric reports; no problem, lucene is perfect for this.
> 
> The trick is that I need to exclude certain terms from the index such as
> those terms that are negated or information that could potentially
identify
> people. I have a collection of natural language processing tools that are
> able to tag or remove/replace such terms.
> 
> I need to design the indexing such that I can feed each document through
> these tools and then incorporate the results into the indexing strategy.
> 
> As an example, if I have a report that has the phrase: "Mr. Smith has no
> history of violence against women prior to this event"
> 
> The NLP engine would recognize the name Smith and the negation of the term
> "violence" and would tag them as such. I would then like to exclude those
> terms from the indexing as seems prudent.
> 
> Another strategy I would like to look at is to include the tags in the
index
> to incorprate it into the search engine. That is to say, whether a subject
> "likely" has a history of violence, "may" have a history of violence, or
> "does not" have a history of violence.
> 
> I assume that I will need to design a custom analyzer to do this, but I
was
> hoping to solicit any comments, advice, or general suggestions before I
get
> started.
> 
> Thanks in advance,
> 
> j
> 
> 
> --
> View this message in context:
http://www.nabble.com/advice-on-integrating-NLP-
> engine-during-indexing-tp14437913p14437913.html
> Sent from the Lucene - General mailing list archive at Nabble.com.


Mime
View raw message