can you share a description of the heuristics you used to clean up the text? i am facing the same problem right now handling email. i'm not interested in the rules you use as much as the tools you use to implement the rules. Herb.... -----Original Message----- From: Ulrich Mayring [mailto:ulim@denic.de] Sent: Friday, November 28, 2003 4:21 AM To: lucene-user@jakarta.apache.org Subject: Re: New Lucene-powered Website This "clean-up work" is actually trickier than the summarising itself and it is usually very domain-specific. That's the reason why I haven't proposed to contribute the summariser to Lucene, because the clean-up code is not generic. The summariser itself is just one class with 300 lines, but without prior clean-up the quality of its summaries is insufficient. --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org