lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chong, Herb" <>
Subject RE: New Lucene-powered Website
Date Mon, 01 Dec 2003 13:42:32 GMT
can you share a description of the heuristics you used to clean up the text? i am facing the
same problem right now handling email. i'm not interested in the rules you use as much as
the tools you use to implement the rules.


-----Original Message-----
From: Ulrich Mayring []
Sent: Friday, November 28, 2003 4:21 AM
Subject: Re: New Lucene-powered Website

This "clean-up work" is actually trickier than the summarising itself 
and it is usually very domain-specific. That's the reason why I haven't 
proposed to contribute the summariser to Lucene, because the clean-up 
code is not generic. The summariser itself is just one class with 300 
lines, but without prior clean-up the quality of its summaries is 

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message