mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ken Krugler <kkrugler_li...@transpac.com>
Subject Re: http://bixolabs.com/datasets/public-terabyte-dataset-project/
Date Tue, 03 Nov 2009 15:49:22 GMT

On Nov 3, 2009, at 5:43am, Grant Ingersoll wrote:

> Might be of interest to all you Mahouts out there...  http://bixolabs.com/datasets/public-terabyte-dataset-project/
>
> Would be cool to get this converted over to our vector format so  
> that we can cluster, etc.


How much additional space would be required for the vectors, in some  
optimal compressed format? Say as a percentage of raw text size.

I'm asking because I have some flexibility in the processing and  
associated metadata I can store as part of the dataset.

-- Ken

--------------------------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g





Mime
View raw message