hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "PoweredBy" by YannickMorel
Date Mon, 22 Nov 2010 16:29:48 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "PoweredBy" page has been changed by YannickMorel.
http://wiki.apache.org/hadoop/PoweredBy?action=diff&rev1=235&rev2=236

--------------------------------------------------

- This page documents an alphabetical list of institutions that are using Hadoop for educational
or production uses.  Companies that offer services on or based around Hadoop are listed in
[[Distributions and Commercial Support|Distributions and Commercial Support]] .
+ This page documents an alphabetical list of institutions that are using Hadoop for educational
or production uses.  Companies that offer services on or based around Hadoop are listed in
[[Distributions and Commercial Support]] .
  
  <<TableOfContents(3)>>
  
  = A =
- 
   * [[http://a9.com/|A9.com]] - Amazon *
    * We build Amazon's product search indices using the streaming API and pre-existing C++,
Perl, and Python tools.
    * We process millions of sessions daily for analytics, using both the Java and streaming
APIs.
@@ -38, +37 @@

    * A 15-node cluster dedicated to processing sorts of business data dumped out of database
and joining them together. These data will then be fed into iSearch, our vertical search engine.
    * Each node has 8 cores, 16G RAM and 1.4T storage.
  
- 
   * [[http://aol.com/|AOL]]
-   * We use hadoop for variety of things ranging from ETL style processing and statistics
generation to running advanced algorithms for doing behavioral analysis and targeting. 
+   * We use hadoop for variety of things ranging from ETL style processing and statistics
generation to running advanced algorithms for doing behavioral analysis and targeting.
    * The Cluster that we use for mainly behavioral analysis and targeting has 150 machines,
Intel Xeon, dual processors, dual core, each with 16GB Ram and 800 GB hard-disk.
  
   * [[http://atbrox.com/|Atbrox]]
@@ -48, +46 @@

    * Cluster: we primarily use Amazon's Elastic Mapreduce
  
  = B =
- 
   * [[http://www.babacar.org/|BabaCar]]
    * 4 nodes cluster (32 cores, 1TB).
    * We use Hadoop for searching and analysis of millions of rental bookings.
@@ -81, +78 @@

    * And use analyzing.
  
  = C =
- 
   * [[http://www.contextweb.com/|Contextweb]] - Ad Exchange
    * We use Hadoop to store ad serving logs and use it as a source for ad optimizations,
analytics, reporting and machine learning.
    * Currently we have a 50 machine cluster with 400 cores and about 140TB raw storage. Each
(commodity) node has 8 cores and 16GB of RAM.
@@ -98, +94 @@

    * [[http://www.springerlink.com/content/np5u8k1x9l6u755g|HDFS as a VM repository for virtual
clusters]]
  
  = D =
- 
   * [[http://datagraph.org/|Datagraph]]
    * We use Hadoop for batch-processing large [[http://www.w3.org/RDF/|RDF]] datasets, in
particular for indexing RDF data.
    * We also use Hadoop for executing long-running offline [[http://en.wikipedia.org/wiki/SPARQL|SPARQL]]
queries for clients.
@@ -121, +116 @@

    * Eliminates the need for explicit data and schema mappings during database integration
  
  = E =
- 
   * [[http://www.ebay.com|EBay]]
    * 532 nodes cluster (8 * 532 cores, 5.3PB).
    * Heavy usage of Java MapReduce, Pig, Hive, HBase
@@ -147, +141 @@

    * Image based video copyright protection.
  
  = F =
- 
   * [[http://www.facebook.com/|Facebook]]
    * We use Hadoop to store copies of internal log and dimension data sources and use it
as a source for reporting/analytics and machine learning.
    * Currently we have 2 major clusters:
@@ -177, +170 @@

    * We also uses Hadoop to analyzing similarities of user's behavior.
  
  = G =
- 
   * [[http://www.google.com|Google]]
    * [[http://www.google.com/intl/en/press/pressrel/20071008_ibm_univ.html|University Initiative
to Address Internet-Scale Computing Challenges]]
  
@@ -194, +186 @@

    * Image and advertising analytics
  
  = H =
- 
   * [[http://www.hadoop.co.kr/|Hadoop Korean User Group]], a Korean Local Community Team
Page.
    * 50 node cluster In the Korea university network environment.
     * Pentium 4 PC, HDFS 4TB Storage
@@ -218, +209 @@

    * What we crawl are our clients Websites and from the information we gather. We fingerprint
old and non updated software packages in that shared hosting environment. We can then inform
our clients that they have old and non updated software running after matching a signature
to a Database. With that information we know which sites would require patching as a free
and courtesy service to protect the majority of users. Without the technologies of Nutch and
Hadoop this would be a far harder to accomplish task.
  
  = I =
- 
   * [[http://www.ibm.com|IBM]]
    * [[http://www-03.ibm.com/press/us/en/pressrelease/22613.wss|Blue Cloud Computing Clusters]]
    * [[http://www-03.ibm.com/press/us/en/pressrelease/22414.wss|University Initiative to
Address Internet-Scale Computing Challenges]]
@@ -246, +236 @@

    * using 10 node hdfs cluster to store and process retrieved data.
  
  = J =
- 
   * [[http://joost.com|Joost]]
    * Session analysis and report generation
  
@@ -254, +243 @@

    * Using Hadoop MapReduce to analyse billions of lines of GPS data to create TrafficSpeeds,
our accurate traffic speed forecast product.
  
  = K =
- 
   * [[http://katta.wiki.sourceforge.net/|Katta]] - Katta serves large Lucene indexes in a
grid environment.
    * Uses Hadoop FileSytem, RPC and IO
  
@@ -265, +253 @@

    * Source code search engine uses Hadoop and Nutch.
  
  = L =
- 
   * [[http://www.last.fm|Last.fm]]
    * 44 nodes
    * Dual quad-core Xeon L5520 (Nehalem) @ 2.27GHz, 16GB RAM, 4TB/node storage.
    * Used for charts calculation, log analysis, A/B testing
  
   * [[http://www.legolas-media.com|Legolas Media]]
-   * 20 dual quad-core nodes, 32GB RAM , 5x1TB 
+   * 20 dual quad-core nodes, 32GB RAM , 5x1TB
    * Used for user profile analysis, statistical analysis,cookie level reporting tools.
-   * Some Hive but mainly automated Java MapReduce jobs that process ~150MM new events/day.

+   * Some Hive but mainly automated Java MapReduce jobs that process ~150MM new events/day.
  
   * [[https://lbg.unc.edu|Lineberger Comprehensive Cancer Center - Bioinformatics Group]]
This is the cancer center at UNC Chapel Hill. We are using Hadoop/HBase for databasing and
analyzing Next Generation Sequencing (NGS) data produced for the [[http://cancergenome.nih.gov/|Cancer
Genome Atlas]] (TCGA) project and other groups. This development is based on the [[http://seqware.sf.net|SeqWare]]
open source project which includes SeqWare Query Engine, a database and web service built
on top of HBase that stores sequence data types. Our prototype cluster includes:
    * 8 dual quad core nodes running CentOS
@@ -283, +270 @@

  
   * [[http://www.linkedin.com|LinkedIn]]
    * We have multiple grids divided up based upon purpose.  They are composed of the following
types of hardware:
-     * 100 Nehalem-based nodes, with 2x4 cores, 24GB RAM, 8x1TB storage using ZFS in a JBOD
configuration on Solaris.
+    * 100 Nehalem-based nodes, with 2x4 cores, 24GB RAM, 8x1TB storage using ZFS in a JBOD
configuration on Solaris.
-     * 120 Westmere-based nodes, with 2x4 cores, 24GB RAM, 6x2TB storage using ext4 in a
JBOD configuration on CentOS 5.5
+    * 120 Westmere-based nodes, with 2x4 cores, 24GB RAM, 6x2TB storage using ext4 in a JBOD
configuration on CentOS 5.5
    * We use Hadoop and Pig for discovering People You May Know and other fun facts.
  
   * [[http://www.lookery.com|Lookery]]
@@ -295, +282 @@

    * Using Hadoop and Hbase for storage, log analysis, and pattern discovery/analysis.
  
  = M =
- 
   * [[http://www.markt24.de/|Markt24]]
    * We use Hadoop to filter user behaviour, recommendations and trends from externals sites
    * Using zkpython
@@ -333, +319 @@

   * [[http://metrixcloud.com/|MetrixCloud]] - provides commercial support, installation,
and hosting of Hadoop Clusters. [[http://metrixcloud.com/contact.php|Contact Us.]]
  
  = N =
- 
   * [[http://www.openneptune.com|Neptune]]
    * Another Bigtable cloning project using Hadoop to store large structured data set.
    * 200 nodes(each node has: 2 dual core CPUs, 2TB storage, 4GB RAM)
@@ -354, +339 @@

    * We use commodity hardware, with 8 cores and 16 GB of RAM per machine
  
  = O =
- 
  = P =
- 
   * [[http://parc.com|PARC]] - Used Hadoop to analyze Wikipedia conflicts [[http://asc.parc.googlepages.com/2007-10-28-VAST2007-RevertGraph-Wiki.pdf|paper]].
- 
  
   * [[http://pharm2phork.org|Pharm2Phork Project]] - Agricultural Traceability
    * Using Hadoop on EC2 to process observation messages generated by RFID/Barcode readers
as items move through supply chain.
@@ -383, +365 @@

    * Our cluster size varies from 5 to 10 nodes. Cluster nodes vary from 2950 Quad Core Rack
Server, with 2x6MB Cache and 4 x 500 GB SATA Hard Drive to E7200 / E7400 processors with 4
GB RAM and 160 GB HDD.
  
  = Q =
- 
   * [[http://www.quantcast.com/|Quantcast]]
    * 3000 cores, 3500TB. 1PB+ processing each day.
    * Hadoop scheduler with fully custom data path / sorter
    * Significant contributions to KFS filesystem
  
  = R =
- 
   * [[http://www.rackspace.com/email_hosting/|Rackspace]]
    * 30 node cluster (Dual-Core, 4-8GB RAM, 1.5TB/node storage)
     * Parses and indexes logs from email hosting system for search: http://blog.racklabs.com/?p=66
@@ -409, +389 @@

    * We intend to parallelize some traditional classification, clustering algorithms like
Naive Bayes, K-Means, EM so that can deal with large-scale data sets.
  
  = S =
- 
   * [[http://www.sara.nl/news/recent/20101103/Hadoop_proof-of-concept.html|SARA, Netherlands]]
    * SARA has initiated a Proof-of-Concept project to evaluate the Hadoop software stack
for scientific use.
  
@@ -444, +423 @@

    * Hosted Hadoop data warehouse solution provider
  
  = T =
- 
   * [[http://www.taragana.com|Taragana]] - Web 2.0 Product development and outsourcing services
    * We are using 16 consumer grade computers to create the cluster, connected by 100 Mbps
network.
    * Used for testing ideas for blog and other data mining.
@@ -480, +458 @@

    * We have 94 nodes (752 cores) in our clusters, as of July 2010, but the number grows
regularly.
  
  = U =
- 
   * [[http://glud.udistrital.edu.co|Universidad Distrital Francisco Jose de Caldas (Grupo
GICOGE/Grupo Linux UD GLUD/Grupo GIGA]]
-   5 node low-profile cluster. We use Hadoop to support the research project: Territorial
Intelligence System of Bogota City.
+   . 5 node low-profile cluster. We use Hadoop to support the research project: Territorial
Intelligence System of Bogota City.
  
   * [[http://ir.dcs.gla.ac.uk/terrier/|University of Glasgow - Terrier Team]]
    * 30 nodes cluster (Xeon Quad Core 2.4GHz, 4GB RAM, 1TB/node storage).
@@ -495, +472 @@

    . We currently run one medium-sized Hadoop cluster (200TB) to store and serve up physics
data for the computing portion of the Compact Muon Solenoid (CMS) experiment. This requires
a filesystem which can download data at multiple Gbps and process data at an even higher rate
locally. Additionally, several of our students are involved in research projects on Hadoop.
  
  = V =
- 
   * [[http://www.veoh.com|Veoh]]
    * We use a small Hadoop cluster to reduce usage data for internal metrics, for search
indexing and for recommendation data.
  
@@ -506, +482 @@

    * We also use Hadoop for filtering and indexing listing, processing log analysis, and
for recommendation data.
  
  = W =
- 
+  * [[http://www.web-alliance.fr|Web Alliance]]
+   * We use Hadoop for our internal search engine optimization (SEO) tools. It allows us
to store, index, search data in a much faster way.
+   * We also use it for logs analysis and trends prediction.
   * [[http://www.worldlingo.com/|WorldLingo]]
    * Hardware: 44 servers (each server has: 2 dual core CPUs, 2TB storage, 8GB RAM)
    * Each server runs Xen with one Hadoop/HBase instance and another instance with web or
application servers, giving us 88 usable virtual machines.
@@ -516, +494 @@

    * Currently we store 12million documents with a target of 450million in the near future.
  
  = X =
- 
  = Y =
- 
   * [[http://www.yahoo.com/|Yahoo!]]
    * More than 100,000 CPUs in >36,000 computers running Hadoop
    * Our biggest cluster: 4000 nodes (2*4cpu boxes w 4*1TB disk & 16GB RAM)
@@ -528, +504 @@

    * >60% of Hadoop Jobs within Yahoo are Pig jobs.
  
  = Z =
- 
   * [[http://www.zvents.com/|Zvents]]
    * 10 node cluster (Dual-Core AMD Opteron 2210, 4GB RAM, 1TB/node storage)
    * Run Naive Bayes classifiers in parallel over crawl data to discover event information

Mime
View raw message