lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chong, Herb" <HCho...@bloomberg.com>
Subject RE: inter-term correlation [was Re: Vector Space Model in Lucene?]
Date Mon, 17 Nov 2003 14:00:39 GMT
respecting sentence boundaries and using them to affect a document's score in the ranking algorithm
requires linguistic knowledge, not NLP knowledge. think about it.

Herb....

-----Original Message-----
From: Stefan Groschupf [mailto:sg@media-style.com]
Sent: Friday, November 14, 2003 9:13 PM
To: Lucene Users List
Subject: Re: inter-term correlation [was Re: Vector Space Model in Lucene?]


What you can do is use a pos tagger (i.e. a maximum entropy model based 
or  Brill tagger if you just have english) and use a data mining 
algorithm for weight your terms.
May be you can use a hidden Markov model for that.

You can build this on top of lucene, shouldn't be that difficult.

But may be I understand you wrong.. ..

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message