hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "PoweredBy" by ArtoBendiken
Date Sat, 08 May 2010 21:30:27 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "PoweredBy" page has been changed by ArtoBendiken.
The comment on this change is: Added a section on Datagraph's use of Hadoop..


   * [[http://www.weblab.infosci.cornell.edu/|Cornell University Web Lab]]
    * Generating web graphs on 100 nodes (dual 2.4GHz Xeon Processor, 2 GB RAM, 72GB Hard
+  * [[http://datagraph.org/|Datagraph]]
+   * We use Hadoop for batch-processing large [[http://www.w3.org/RDF/|RDF]] datasets, in
particular for indexing RDF data.
+   * We also use Hadoop for executing long-running offline [[http://en.wikipedia.org/wiki/SPARQL|SPARQL]]
queries for clients.
+   * We use Amazon S3 and Cassandra to store input RDF datasets and output files.
+   * We've developed [[http://rdfgrid.rubyforge.org/|RDFgrid]], a Ruby framework for map/reduce-based
processing of RDF data.
+   * We primarily use Ruby, [[http://rdf.rubyforge.org/|RDF.rb]] and RDFgrid to process RDF
data with Hadoop Streaming.
+   * We primarily run Hadoop jobs on Amazon Elastic MapReduce, with cluster sizes of 1 to
20 nodes depending on the size of the dataset (hundreds of millions to billions of RDF statements).
   * [[http://www.deepdyve.com|Deepdyve]]
    * Elastic cluster with 5-80 nodes

View raw message