hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "PoweredBy" by KevinWeil
Date Thu, 18 Mar 2010 09:30:55 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "PoweredBy" page has been changed by KevinWeil.
The comment on this change is: Added Twitter to the Hadoop powered by page..


    * We use Hadoop in our data mining and user modeling, multimedia, and internet research
    * 6 node cluster with 96 total cores, 8GB RAM and 2 TB storage per machine.
+  * [[http://www.twitter.com|Twitter]]
+   * We use hadoop to store and process tweets, log files, and many other types of data generated
across Twitter.  All data is stored as compressed LZO files.
+   * We use both Scala and Java to access Hadoop's MapReduce APIs
+   * We use Pig heavily for both scheduled and ad-hoc jobs, due to its ability to accomplish
a lot with few statements.
+   * We employ committers on Pig, Avro, Hive, and Cassandra, and contribute much of our internal
Hadoop work to opensource (see [[http://github.com/kevinweil/hadoop-lzo|hadoop-lzo]])
+   * For more on our use of hadoop, see the following presentations: [[http://www.slideshare.net/kevinweil/hadoop-pig-and-twitter-nosql-east-2009|Hadoop
and Pig at Twitter]] and [[http://www.slideshare.net/kevinweil/protocol-buffers-and-hadoop-at-twitter|Protocol
Buffers and Hadoop at Twitter]]
   * [[http://ir.dcs.gla.ac.uk/terrier/|University of Glasgow - Terrier Team]]
    * 30 nodes cluster (Xeon Quad Core 2.4GHz, 4GB RAM, 1TB/node storage).
    We use Hadoop to facilitate information retrieval research & experimentation, particularly
for TREC, using the Terrier IR platform. The open source release of [[http://ir.dcs.gla.ac.uk/terrier/|Terrier]]
includes large-scale distributed indexing using Hadoop Map Reduce.

View raw message