From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "PoweredBy" by SomeOtherAccount
Date Tue, 26 Oct 2010 15:56:50 GMT
The "PoweredBy" page has been changed by SomeOtherAccount.


- Applications and organizations using Hadoop include (alphabetically):
+ This page documents an alphabetical list of institutions that are using Hadoop for educational
or production uses.  Companies that offer services on or based around Hadoop are listed in
@@ -38, +38 @@

    * A 15-node cluster dedicated to processing sorts of business data dumped out of database
and joining them together. These data will then be fed into iSearch, our vertical search engine.
    * Each node has 8 cores, 16G RAM and 1.4T storage.
-  * [[http://aws.amazon.com/|Amazon Web Services]]
-   * We provide [[http://aws.amazon.com/elasticmapreduce|Amazon Elastic MapReduce]]. It's
a web service that provides a hosted Hadoop framework running on the web-scale infrastructure
of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3).
-   * Our customers can instantly provision as much or as little capacity as they like to
perform data-intensive tasks for applications such as web indexing, data mining, log file
analysis, machine learning, financial analysis, scientific simulation, and bioinformatics
   * [[http://aol.com/|AOL]]
    * We use hadoop for variety of things ranging from ETL style processing and statistics
generation to running advanced algorithms for doing behavioral analysis and targeting. 
@@ -71, +67 @@

   * [[http://www.benipaltechnologies.com|Benipal Technologies]] - Outsourcing, Consulting,
    * 35 Node Cluster (Core2Quad Q9400 Processor, 4-8 GB RAM, 500 GB HDD)
    * Largest Data Node with Xeon E5420*2 Processors, 64GB RAM, 3.5 TB HDD
    * Total Cluster capacity of around 20 TB on a gigabit network with failover and redundancy
    * Hadoop is used for internal data crunching, application development, testing and getting
around I/O limitations
@@ -79, +74 @@

   * [[http://bixolabs.com/|Bixo Labs]] - Elastic web mining
    * The Bixolabs elastic web mining platform uses Hadoop + Cascading to quickly build scalable
web mining applications.
    * We're doing a 200M page/5TB crawl as part of the [[http://bixolabs.com/datasets/public-terabyte-dataset-project/|public
terabyte dataset project]].
    * This runs as a 20 machine [[http://aws.amazon.com/elasticmapreduce/|Elastic MapReduce]]
   * [[http://www.brainpad.co.jp|BrainPad]] - Data mining and analysis
@@ -87, +81 @@

    * And use analyzing.
  = C =
-  * [[http://www.cascading.org/|Cascading]] - Cascading is a feature rich API for defining
and executing complex and fault tolerant data processing workflows on a Hadoop cluster.
-  * [[http://www.cloudera.com|Cloudera, Inc]] - Cloudera provides commercial support and
professional training for Hadoop.
-   * We provide [[http://www.cloudera.com/hadoop|Cloudera's Distribution for Hadoop]]. Stable
packages for Redhat and Ubuntu (rpms / debs), EC2 Images and web based configuration.
-   * Check out our [[http://www.cloudera.com/blog|Hadoop and Big Data Blog]]
-   * Get [[http://oreilly.com/catalog/9780596521998/index.html|"Hadoop: The Definitive Guide"]]
(Tom White/O'Reilly)
   * [[http://www.contextweb.com/|Contextweb]] - Ad Exchange
    * We use Hadoop to store ad serving logs and use it as a source for ad optimizations,
analytics, reporting and machine learning.
@@ -115, +102 @@

    * We've developed [[http://rdfgrid.rubyforge.org/|RDFgrid]], a Ruby framework for map/reduce-based
processing of RDF data.
    * We primarily use Ruby, [[http://rdf.rubyforge.org/|RDF.rb]] and RDFgrid to process RDF
data with Hadoop Streaming.
    * We primarily run Hadoop jobs on Amazon Elastic MapReduce, with cluster sizes of 1 to
20 nodes depending on the size of the dataset (hundreds of millions to billions of RDF statements).
-  * [[http://www.datameer.com|Datameer]]
-   * Datameer Analytics Solution (DAS) is the first Hadoop-based solution for big data analytics
that includes data source integration, storage, an analytics engine and visualization.
-   * DAS Log File Aggregator is a plug-in to DAS that makes it easy to import large numbers
of log files stored on disparate servers.
   * [[http://www.deepdyve.com|Deepdyve]]
    * Elastic cluster with 5-80 nodes
@@ -234, +217 @@

   * [[http://www.ibm.com|IBM]]
    * [[http://www-03.ibm.com/press/us/en/pressrelease/22613.wss|Blue Cloud Computing Clusters]]
    * [[http://www-03.ibm.com/press/us/en/pressrelease/22414.wss|University Initiative to
Address Internet-Scale Computing Challenges]]
   * [[http://www.iccs.informatics.ed.ac.uk/|ICCS]]
@@ -268, +250 @@

    * Using Hadoop MapReduce to analyse billions of lines of GPS data to create TrafficSpeeds,
our accurate traffic speed forecast product.
  = K =
-  * [[http://www.karmasphere.com/|Karmasphere]]
-   * Distributes [[http://www.hadoopstudio.org/|Karmasphere Studio for Hadoop]], which allows
cross-version development and management of Hadoop jobs in a familiar integrated development
   * [[http://katta.wiki.sourceforge.net/|Katta]] - Katta serves large Lucene indexes in a
grid environment.
    * Uses Hadoop FileSytem, RPC and IO
@@ -342, +321 @@

    * 18 node cluster (Quad-Core AMD Opteron 2347, 1TB/node storage)
    * Powers data for search and aggregation
-  * [[http://lucene.apache.org/mahout|Mahout]]
-   . Another Apache project using Hadoop to build scalable machine learning algorithms like
canopy clustering, k-means and many more to come (naive bayes classifiers, others)
   * [[http://metrixcloud.com/|MetrixCloud]] - provides commercial support, installation,
and hosting of Hadoop Clusters. [[http://metrixcloud.com/contact.php|Contact Us.]]
  = N =
@@ -368, +344 @@

    * We rely on Apache Pig for reporting, analytics, Cascading for machine learning, and
on a proprietary JavaScript API for ad-hoc queries
    * We use commodity hardware, with 8 cores and 16 GB of RAM per machine
-  * [[http://lucene.apache.org/nutch|Nutch]] - flexible web search engine software
  = O =
  = P =
   * [[http://parc.com|PARC]] - Used Hadoop to analyze Wikipedia conflicts [[http://asc.parc.googlepages.com/2007-10-28-VAST2007-RevertGraph-Wiki.pdf|paper]].
-  * [[http://pentaho.com|Pentaho]] – Open Source Business Intelligence
-   * Pentaho provides the only complete, end-to-end open  source BI alternative to proprietary
offerings like Oracle, SAP and  IBM
-   * We provide an easy-to-use, graphical ETL tool that  is integrated with Hadoop for managing
data and coordinating Hadoop related  tasks in the broader context of your ETL and Business
Intelligence  workflow
-   * We also provide Reporting and Analysis capabilities  against big data in Hadoop
-   * Learn more at [[http://www.pentaho.com/hadoop/|http://www.pentaho.com/hadoop]]
   * [[http://pharm2phork.org|Pharm2Phork Project]] - Agricultural Traceability
    * Using Hadoop on EC2 to process observation messages generated by RFID/Barcode readers
as items move through supply chain.

