lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chong, Herb" <HCho...@bloomberg.com>
Subject RE: New Lucene-powered Website
Date Mon, 01 Dec 2003 13:42:32 GMT
can you share a description of the heuristics you used to clean up the text? i am facing the
same problem right now handling email. i'm not interested in the rules you use as much as
the tools you use to implement the rules.

Herb....

-----Original Message-----
From: Ulrich Mayring [mailto:ulim@denic.de]
Sent: Friday, November 28, 2003 4:21 AM
To: lucene-user@jakarta.apache.org
Subject: Re: New Lucene-powered Website

This "clean-up work" is actually trickier than the summarising itself 
and it is usually very domain-specific. That's the reason why I haven't 
proposed to contribute the summariser to Lucene, because the clean-up 
code is not generic. The summariser itself is just one class with 300 
lines, but without prior clean-up the quality of its summaries is 
insufficient.

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message