hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "PoweredBy" by BradfordStephens
Date Wed, 18 Mar 2009 00:39:20 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by BradfordStephens:
http://wiki.apache.org/hadoop/PoweredBy

The comment on the change is:
Added information about Visible Technologies and Hadoop

------------------------------------------------------------------------------
    * A 15-node cluster dedicated to processing sorts of business data dumped out of database
and joining them together. These data will then be fed into iSearch, our vertical search engine.
    * Each node has 8 cores, 16G RAM and 1.4T storage.
  
-  * [http://aol.com/ AOL] 
+  * [http://aol.com/ AOL]
    * We use hadoop for variety of things ranging from ETL style processing and statistics
generation to running advanced algorithms for doing behavioral analysis and targeting.
    * Our cluster size is 50 machines, Intel Xeon, dual processors, dual core, each with 16GB
Ram and 800 GB hard-disk giving us a total of 37 TB HDFS capacity.
  
@@ -44, +44 @@

    * We're writing [http://oreilly.com/catalog/9780596521998/index.html "Hadoop: The Definitive
Guide"] (Tom White/O'Reilly)
  
  
-  * [http://www.contextweb.com/ Contextweb] - ADSDAQ Ad Excange 
+  * [http://www.contextweb.com/ Contextweb] - ADSDAQ Ad Excange
-   * We use Hadoop to store ad serving log and use it as a source for Ad optimizations/Analytics/reporting/machine
learning. 
+   * We use Hadoop to store ad serving log and use it as a source for Ad optimizations/Analytics/reporting/machine
learning.
    * Currently we have a 23 machine cluster with 184 cores and about 35TB raw storage.  Each
(commodity) node has 8 cores, 8GB RAM and 1.7 TB of storage.
  
   * [http://www.weblab.infosci.cornell.edu/ Cornell University Web Lab]
@@ -65, +65 @@

    * Image content based advertising and auto-tagging for social media.
    * Image based video copyright protection.
  
-  * [http://www.facebook.com/ Facebook] 
+  * [http://www.facebook.com/ Facebook]
-   * We use Hadoop to store copies of internal log and dimension data sources and use it
as a source for reporting/analytics and machine learning. 
+   * We use Hadoop to store copies of internal log and dimension data sources and use it
as a source for reporting/analytics and machine learning.
    * Currently have a 600 machine cluster with 4800 cores and about 2 PB raw storage.  Each
(commodity) node has 8 cores and 4 TB of storage.
    * We are heavy users of both streaming as well as the Java apis. We have built a higher
level data warehousing framework using these features called Hive (see the [http://hadoop.apache.org/hive/]).
 We have also developed a FUSE implementation over hdfs.
  
@@ -76, +76 @@

    * Use for log analysis, data mining and machine learning
  
   * [http://www.hadoop.co.kr/ Hadoop Korean User Group], a Korean Local Community Team Page.
-   * 50 node cluster In the Korea university network environment. 
+   * 50 node cluster In the Korea university network environment.
     * Pentium 4 PC, HDFS 4TB Storage
    * Used for development projects
     * Retrieving and Analyzing Biomedical Knowledge
@@ -103, +103 @@

    Hadoop is also beginning to be used in our teaching and general research
    activities on natural language processing and machine learning.
  
-  * [http://search.iiit.ac.in/ IIIT, Hyderabad] 
+  * [http://search.iiit.ac.in/ IIIT, Hyderabad]
    * We use hadoop for Information Retrieval and Extraction research projects. Also working
on map-reduce scheduling research for multi-job environments.
    * Our cluster sizes vary from 10 to 30 nodes, depending on the jobs. Heterogenous nodes
with most being Quad 6600s, 4GB RAM and 1TB disk per node. Also some nodes with dual core
and single core configurations.
  
   * [http://www.imageshack.us/ ImageShack]
    * From [http://www.techcrunch.com/2008/05/20/update-imageshack-ceo-hints-at-his-grander-ambitions/
TechCrunch]:
-     Rather than put ads in or around the images it hosts, Levin is working on harnessing
all the data his 
+     Rather than put ads in or around the images it hosts, Levin is working on harnessing
all the data his
-     service generates about content consumption (perhaps to better target advertising on
ImageShack or to 
+     service generates about content consumption (perhaps to better target advertising on
ImageShack or to
      syndicate that targetting data to ad networks). Like Google and Yahoo, he is deploying
the open-source
      Hadoop software to create a massive distributed supercomputer, but he is using it to
analyze all the
      data he is collecting.
@@ -125, +125 @@

    * Session analysis and report generation
  
   * [http://katta.wiki.sourceforge.net/ Katta] - Katta serves large Lucene indexes in a grid
environment.
-    * Uses Hadoop FileSytem, RPC and IO 
+    * Uses Hadoop FileSytem, RPC and IO
  
   * [http://www.koubei.com/ Koubei.com ] Large local community and local search at China.
     Using Hadoop to process apache log, analyzing user's action and click flow and the links
click with any specified page in site and more.  Using Hadoop to process whole price data
user input with map/reduce.
  
   * [http://krugle.com/ Krugle]
-   * Source code search engine uses Hadoop and Nutch. 
+   * Source code search engine uses Hadoop and Nutch.
  
   * [http://www.last.fm Last.fm]
    * 50 nodes (dual xeon LV 2GHz, 4GB RAM, 1TB/node storage and dual xeon L5320 1.86GHz,
8GB RAM, 3TB/node storage).
@@ -155, +155 @@

     * Another Bigtable cloning project using Hadoop to store large structured data set.
     * 200 nodes(each node has: 2 dual core CPUs, 2TB storage, 4GB RAM)
  
-  * [http://www.netseer.com NetSeer] - 
+  * [http://www.netseer.com NetSeer] -
    * Up to 1000 instances on [http://www.amazon.com/b/ref=sc_fe_l_2/002-1156069-5604805?ie=UTF8&node=201590011&no=3435361&me=A36L942TSJ2AJA
Amazon EC2]
    * Data storage in [http://www.amazon.com/S3-AWS-home-page-Money/b/ref=sc_fe_l_2/002-1156069-5604805?ie=UTF8&node=16427261&no=3435361&me=A36L942TSJ2AJA
Amazon S3]
    * 50 node cluster in Coloc
@@ -163, +163 @@

  
   * [http://nytimes.com The New York Times]
    * [http://open.blogs.nytimes.com/2007/11/01/self-service-prorated-super-computing-fun/
Large scale image conversions]
-   * Used EC2 to run hadoop on a large virtual cluster 
+   * Used EC2 to run hadoop on a large virtual cluster
  
   * [http://www.ning.com Ning]
    * We use Hadoop to store and process our log files
@@ -224, +224 @@

     We are one of six universities participating in IBM/Google's academic
     cloud computing initiative.  Ongoing research and teaching efforts
     include projects in machine translation, language modeling,
-    bioinformatics, email analysis, and image processing. 
+    bioinformatics, email analysis, and image processing.
  
   * [http://t2.unl.edu University of Nebraska Lincoln, Research Computing Facility]
     We currently run one medium-sized Hadoop cluster (200TB) to store and serve up physics
data
@@ -233, +233 @@

     several of our students are involved in research projects on Hadoop.
  
   * [http://www.veoh.com Veoh]
-   * We use a small Hadoop cluster to reduce usage data for internal metrics, for search
indexing and for recommendation data. 
+   * We use a small Hadoop cluster to reduce usage data for internal metrics, for search
indexing and for recommendation data.
  
   * [http://www.visiblemeasures.com Visible Measures Corporation] uses Hadoop as a component
in our Scalable Data Pipeline, which ultimately powers !VisibleSuite and other products. 
We use Hadoop to aggregate, store, and analyze data related to in-stream viewing behavior
of Internet video audiences.   Our current grid contains more than 128 CPU cores and in excess
of 100 terabytes of storage, and we plan to grow that substantially during 2008.
+ 
+  * [http://www.visibletechnologies.com Visible Technologies] Hadoop is quickly becoming
the core of our business. We use it to extract Business Intelligence out of Consumer Generated
Media.
+     *Running on over 150 servers through 2009
+     *Use Nutch to crawl and index HTML pages, Lucene and HBase to store documents, Solr
to search, Zookeeper to manage search shards, and possibly Mahout for semantic Machine Learning
+     *Many BI-related tasks run on Hadoop to extract meaningful data (topics, authors, keywords,
link graphs, etc)
+ 
  
   * [http://www.vksolutions.com/ VK Solutions]
    * We use a small Hadoop cluster in the scope of our general research activities at [http://www.vklabs.com
VK Labs] to get a faster data access from web applications.
-   * We also use Hadoop for filtering and indexing listing, processing log analysis, and
for recommendation data.  
+   * We also use Hadoop for filtering and indexing listing, processing log analysis, and
for recommendation data.
  
   * [http://www.worldlingo.com/ WorldLingo]
    * Hardware: 44 servers (each server has: 2 dual core CPUs, 2TB storage, 8GB RAM)
    * Each server runs Xen with one Hadoop/HBase instance and another instance with web or
application servers, giving us 88 usable virtual machines.
    * We run two separate Hadoop/HBase clusters with 22 nodes each.
    * Hadoop is primarily used to run HBase and Map/Reduce jobs scanning over the HBase tables
to perform specific tasks.
-   * HBase is used as a scalable and fast storage back end for millions of documents. 
+   * HBase is used as a scalable and fast storage back end for millions of documents.
    * Currently we store 12million documents with a target of 450million in the near future.
  
   * [http://www.yahoo.com/ Yahoo!]

Mime
View raw message