hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "BristolHadoopWorkshop" by SteveLoughran
Date Thu, 20 Aug 2009 13:04:56 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by SteveLoughran:

The comment on the change is:
added discussion

  The key point here being that yes, something done as a chain of MR jobs on a Hadoop cluster
may seem an inefficient approach, but if there is no other way to store that much data, or
run through it, then graph people will be happy.
+ = Yahoo! MS search deal =
+ This was a discussion topic run by Julio
+ * 400 Y! staff are moving to MS. How many are search specialists, versus Hadoop hackers.

+ * Y! is driving large scale tests, facebook is #2. 
+ * Y! are making Hadoop the core of the company; it is their LOB of datacentre. 
+ What are the risks of the Merger, and warning signs of trouble:
+  # silence: Y! developers do their own fork, it goes closed source. We have seen this happen
in other OSS projects (Axis), where a single company suddenly disappears. There is no defence
from this other than making sure development knowledge is widespread. The JIRA-based discussion/documentation
is good here,
+  as it preserves all knowledge, and makes decisions in the open.
+  # staff departure. Key staff in the Hadoop team could leave, which would set things back.
Moving into MS could be bad, but moving to Google would set back development the worst. 
+  # slower development/rate of feature addition
+  # reduced release rate. This can compensate for reduced testing resources.
+  # reduced rate of bug fixes. We can assume that Y!s own problems will be addressed, then
everything else is other people's problems. 
+  # Less testing, reduced quality
+ Apparently under [http://community.cloudera.com] - number of messages/JIRA and infer activity,
such as [http://community.cloudera.com/reports/47/contributors/ contributors] and [http://community.cloudera.com/reports/47/issues/
popular issues]
+ At the same time, there are opportunities for people outside Yahoo!
+  * more agile deployments
+  * more open to contributions from other people, universities etc.
+ Of course, this could impact release schedule/quality; needs to be managed well.
+ Clearly for Cloudera, this gives them a greater opportunity to position themselves as "the
owners of Hadoop", especially if they get more of the core Hadoop people on board. However,
Apache do try to add their own management layer to stop handing off full  
+ What are the increased responsibilities for everyone else involved with Hadoop?
+  * Everyone has to test on larger cluster. EC2 may get tested, but it's not enough as it
is virtual, and only represents one single site/network config.  
+  * Everyone should pull down and play with the pre-releases, on test clusters. Check the
FS upgrades work, etc.

View raw message