lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <>
Subject Sentence detection/extraction as Tokenizer?
Date Fri, 27 Nov 2009 18:07:36 GMT

The contrib/wordnet package contains an AnalyzerUtil class with a method that extracts sentences
from text/String.  It is super-simplistic, so probably not very accurate, but I am wondering
if *conceptually* it would make sense to have a Tokenizer that extracts sentences?  I suppose
that means each Token would be a complete sentence.

Would you say it makes sense to implement sentence detection/extraction as a Tokenizer?


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message