lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Quaroni <dquar...@OPENRATINGS.com>
Subject RE: Contributing to Lucene (was RE: inter-term correlation [was R e: Vector Space Model in Lucene?])
Date Mon, 17 Nov 2003 15:27:06 GMT
I'm not sure I can share a sample, but the specific situation I'm thinking
of is when you have data that doesn't exist within a sentence, for example
the name, address, etc of a company.  Some foreign companies have funky
punctuation within their names and addresses.  

I'd have to see the results to know if the NLP would mess anything up.  If
all it did was weight the results, then perhaps it wouldn't, but it's also
possible that it would.  Basically my concern is that it would mess up the
use of lucene for non-sentence based applications that might contain
punctuation.

On the whole I think adding the NLP to lucene is a good idea because the
vast majority of the applications of lucene would benefit from it.  Making
it optional could be a good way to maintain the current power of lucene and
perhaps also retain the speed depending on the performance of the NLP
functionality and the needs of the user.


-----Original Message-----
From: Chong, Herb [mailto:HChong3@bloomberg.com]
Sent: Monday, November 17, 2003 10:01 AM
To: Lucene Users List
Subject: RE: Contributing to Lucene (was RE: inter-term correlation [was
R e: Vector Space Model in Lucene?])


show an example document.

Herb....

-----Original Message-----
From: Dan Quaroni [mailto:dquaroni@OPENRATINGS.com]
Sent: Monday, November 17, 2003 9:48 AM
To: 'Lucene Users List'
Subject: RE: Contributing to Lucene (was RE: inter-term correlation [was
R e: Vector Space Model in Lucene?])


My only concern with this being integrated into lucene is that it be done in
a way that doesn't make its use mandatory.  Lucene is powerful enough that
it can be used for a lot of cases where NLP doesn't make any sense.  For
example, I think that sentence boundaries would severely screw up the
project I recently did using lucene because there are no sentences, but
there is punctuation.

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message