mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Machine Resources [was Re: Confluence Wiki]
Date Sun, 27 Jan 2008 18:39:12 GMT
On Jan 25, 2008, at 11:43 PM, Mason Tang wrote:
> Also, is there any chance we'll be able to get a small (and I mean  
> small) cluster to run some tests on?  Local Hadoop testing only gets  
> you so far...

Yeah, this type of thing is perennially a problem.  I think we will  
have to beg/borrow/steal (just kidding on the steal).  I think the key  
will be to get local stuff running and then start looking around for  
resources.  Amazon EC2 is an obvious place, but short of someone  
donating time on it, I am not sure how we would come about it.

I don't know enough about Apache's infrastructure to know whether  
there is enough to cobble together.  Committers can get access to  
Lucene's zones (virtual server) machine.  I know that it is a problem  
that Nutch faces as well, presumably.  Hadoop, luckily, is fairly well  
supported by Yahoo! and other companies with machine access.  My hope  
is if we can show some promise with code that runs well on single or  
small clusters that maybe we can garner some interest from bigger  
supporters.  And, of course, most machines are multi-core these days  
and Hadoop can leverage that, as I understand it.

Perhaps, if we can organize it and make sure it is secure, we can try  
to figure out a way for the various people here to pull together our  

Just thinking out loud...


View raw message