mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ken Krugler <>
Subject Re:
Date Fri, 13 Nov 2009 18:36:54 GMT
Hi all,

Another issue came up, about cleaning the text.

One interested user suggested using nCleaner (see

  as a way of tossing boilerplate text that skews text frequency data.

Any thoughts on this?


-- Ken

On Nov 3, 2009, at 5:43am, Grant Ingersoll wrote:

> Might be of interest to all you Mahouts out there...
> Would be cool to get this converted over to our vector format so  
> that we can cluster, etc.

Ken Krugler
+1 530-210-6378
e l a s t i c   w e b   m i n i n g

View raw message