mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: anybody want to set a record with Mahout?
Date Thu, 25 Feb 2010 21:03:11 GMT

On Feb 25, 2010, at 3:41 PM, Jake Mannix wrote:

> On Thu, Feb 25, 2010 at 12:38 PM, Robin Anil <robin.anil@gmail.com> wrote:
> 
>> Whats the largest dataset available? BixoLabs ? Wikipedia(5 Mil
>> articles)...
>> I dont know anything public that is that big
>> 
> 
> 5 million articles, if you take all the 1,2,3,4, and 5-grams data out of it,
> you
> could easily hit more than 4B individual matrix entries.

Is this meaningful to actually do (combine the various sizes) as an experiment other than
for sheer size?
Mime
View raw message