hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "PoweredBy" by vuelos
Date Sat, 17 Oct 2009 23:19:26 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "PoweredBy" page has been changed by vuelos.
http://wiki.apache.org/hadoop/PoweredBy?action=diff&rev1=158&rev2=159

--------------------------------------------------

  Applications and organizations using Hadoop include (alphabetically):
+ 
   * [[http://a9.com/|A9.com]] - Amazon
    * We build Amazon's product search indices using the streaming API and pre-existing C++,
Perl, and Python tools.
    * We process millions of sessions daily for analytics, using both the Java and streaming
APIs.
    * Our clusters vary from 1 to 100 nodes.
  
   * [[http://www.adobe.com|Adobe]]
-   * We use Hadoop and HBase in several areas from social services to structured data storage
and processing for internal use. 
+   * We use Hadoop and HBase in several areas from social services to structured data storage
and processing for internal use.
    * We currently have about 30 nodes running HDFS, Hadoop and HBase  in clusters ranging
from 5 to 14 nodes on both production and development. We plan a deployment on an 80 nodes
cluster.
    * We constantly write data to HBase and run MapReduce jobs to process then store it back
to HBase or external systems.
    * Our production cluster has been running since Oct 2008.
@@ -65, +66 @@

    * Generating web graphs on 100 nodes (dual 2.4GHz Xeon Processor, 2 GB RAM, 72GB Hard
Drive)
  
   * [[http://www.deepdyve.com|Deepdyve]]
-   * Elastic cluster with 5-80 nodes 
+   * Elastic cluster with 5-80 nodes
    * We use hadoop to create our indexes of deep web content and to provide a high availability
and high bandwidth storage service for index shards for our search cluster.
  
   * [[http://search.detik.com|Detikcom]] - Indonesia's largest news portal
@@ -99, +100 @@

   * [[http://www.facebook.com/|Facebook]]
    * We use Hadoop to store copies of internal log and dimension data sources and use it
as a source for reporting/analytics and machine learning.
    * Currently have a 600 machine cluster with 4800 cores and about 2 PB raw storage.  Each
(commodity) node has 8 cores and 4 TB of storage.
-   * We are heavy users of both streaming as well as the Java apis. We have built a higher
level data warehousing framework using these features called Hive (see the [[http://hadoop.apache.org/hive/]]).
 We have also developed a FUSE implementation over hdfs.
+   * We are heavy users of both streaming as well as the Java apis. We have built a higher
level data warehousing framework using these features called Hive (see the http://hadoop.apache.org/hive/).
 We have also developed a FUSE implementation over hdfs.
  
   * [[http://www.foxaudiencenetwork.com|FOX Audience Network]]
    * 40 machine cluster (8 cores/machine, 2TB/machine storage)
@@ -135, +136 @@

  
   * [[http://www.hadoop.tw/|Hadoop Taiwan User Group]]
  
-  * [[http://holaservers.com/|HolaServers.com]]
-   * Hosting company
-   * Use pig to provide traffic stats to users in near real time
+  * [[http://net-ngo.com|Hipotecas y euribor]]
+   * Evolución del euribor y valor actual
+   * Simulador de hipotecas en crisis económica
  
   * [[http://www.hostinghabitat.com/|Hosting Habitat]]
-   * We use a customised version of Hadoop and Nutch in a currently experimental 6 node/Dual
Core cluster environment. 
+   * We use a customised version of Hadoop and Nutch in a currently experimental 6 node/Dual
Core cluster environment.
-   * What we crawl are our clients Websites and from the information we gather. We fingerprint
old and non updated software packages in that shared hosting environment. We can then inform
our clients that they have old and non updated software running after matching a signature
to a Database. With that information we know which sites would require patching as a free
and courtesy service to protect the majority of users. Without the technologies of Nutch and
Hadoop this would be a far harder to accomplish task. 
+   * What we crawl are our clients Websites and from the information we gather. We fingerprint
old and non updated software packages in that shared hosting environment. We can then inform
our clients that they have old and non updated software running after matching a signature
to a Database. With that information we know which sites would require patching as a free
and courtesy service to protect the majority of users. Without the technologies of Nutch and
Hadoop this would be a far harder to accomplish task.
  
   * [[http://www.ibm.com|IBM]]
    * [[http://www-03.ibm.com/press/us/en/pressrelease/22613.wss|Blue Cloud Computing Clusters]]
    * [[http://www-03.ibm.com/press/us/en/pressrelease/22414.wss|University Initiative to
Address Internet-Scale Computing Challenges]]
  
   * [[http://www.iccs.informatics.ed.ac.uk/|ICCS]]
+   * We are using Hadoop and Nutch to crawl Blog posts and later process them. Hadoop is
also beginning to be used in our teaching and general research activities on natural language
processing and machine learning.
-   * We are using Hadoop and Nutch to crawl Blog posts and later process them.
-   Hadoop is also beginning to be used in our teaching and general research
-   activities on natural language processing and machine learning.
  
   * [[http://search.iiit.ac.in/|IIIT, Hyderabad]]
    * We use hadoop for Information Retrieval and Extraction research projects. Also working
on map-reduce scheduling research for multi-job environments.
@@ -158, +157 @@

  
   * [[http://www.imageshack.us/|ImageShack]]
    * From [[http://www.techcrunch.com/2008/05/20/update-imageshack-ceo-hints-at-his-grander-ambitions/|TechCrunch]]:
-     Rather than put ads in or around the images it hosts, Levin is working on harnessing
all the data his
+    . Rather than put ads in or around the images it hosts, Levin is working on harnessing
all the data his
+    service generates about content consumption (perhaps to better target advertising on
ImageShack or to syndicate that targetting data to ad networks). Like Google and Yahoo, he
is deploying the open-source Hadoop software to create a massive distributed supercomputer,
but he is using it to analyze all the data he is collecting.
-     service generates about content consumption (perhaps to better target advertising on
ImageShack or to
-     syndicate that targetting data to ad networks). Like Google and Yahoo, he is deploying
the open-source
-     Hadoop software to create a massive distributed supercomputer, but he is using it to
analyze all the
-     data he is collecting.
  
   * [[http://www.isi.edu/|Information Sciences Institute (ISI)]]
    * Used Hadoop and 18 nodes/52 cores to [[http://www.isi.edu/ant/address/whole_internet/|plot
the entire internet]].
@@ -174, +170 @@

    * Session analysis and report generation
  
   * [[http://www.journeydynamics.com|Journey Dynamics]]
-   * Using Hadoop MapReduce to analyse billions of lines of GPS data to create Traffic``Speeds,
our accurate traffic speed forecast product.
+   * Using Hadoop MapReduce to analyse billions of lines of GPS data to create TrafficSpeeds,
our accurate traffic speed forecast product.
  
   * [[http://www.karmasphere.com/|Karmasphere]]
    * Distributes [[http://www.hadoopstudio.org/|Karmasphere Studio for Hadoop]], which allows
cross-version development and management of Hadoop jobs in a familiar integrated development
environment.
  
   * [[http://katta.wiki.sourceforge.net/|Katta]] - Katta serves large Lucene indexes in a
grid environment.
-    * Uses Hadoop FileSytem, RPC and IO
+   * Uses Hadoop FileSytem, RPC and IO
  
-  * [[http://www.koubei.com/|Koubei.com ]] Large local community and local search at China.
+  * [[http://www.koubei.com/|Koubei.com]] Large local community and local search at China.
-    Using Hadoop to process apache log, analyzing user's action and click flow and the links
click with any specified page in site and more.  Using Hadoop to process whole price data
user input with map/reduce.
+   . Using Hadoop to process apache log, analyzing user's action and click flow and the links
click with any specified page in site and more.  Using Hadoop to process whole price data
user input with map/reduce.
  
   * [[http://krugle.com/|Krugle]]
    * Source code search engine uses Hadoop and Nutch.
@@ -198, +194 @@

    * Our cluster runs across Amazon's EC2 webservice and makes use of the streaming module
to use Python for most operations.
  
   * [[http://www.lotame.com|Lotame]]
-    * Using Hadoop and Hbase for storage, log analysis, and pattern discovery/analysis.
+   * Using Hadoop and Hbase for storage, log analysis, and pattern discovery/analysis.
  
   * [[http://www.mylife.com/|MyLife]]
    * 18 node cluster (Quad-Core AMD Opteron 2347, 1TB/node storage)
    * Powers data for search and aggregation
  
   * [[http://lucene.apache.org/mahout|Mahout]]
-    Another Apache project using Hadoop to build scalable machine learning   algorithms like
canopy clustering, k-means and many more to come (naive bayes classifiers, others)
+   . Another Apache project using Hadoop to build scalable machine learning   algorithms
like canopy clustering, k-means and many more to come (naive bayes classifiers, others)
  
   * [[http://metrixcloud.com/|MetrixCloud]] - provides commercial support, installation,
and hosting of Hadoop Clusters. [[http://metrixcloud.com/contact.php|Contact Us.]]
  
   * [[http://www.openneptune.com|Neptune]]
-    * Another Bigtable cloning project using Hadoop to store large structured data set.
+   * Another Bigtable cloning project using Hadoop to store large structured data set.
-    * 200 nodes(each node has: 2 dual core CPUs, 2TB storage, 4GB RAM)
+   * 200 nodes(each node has: 2 dual core CPUs, 2TB storage, 4GB RAM)
  
   * [[http://www.netseer.com|NetSeer]] -
    * Up to 1000 instances on [[http://www.amazon.com/b/ref=sc_fe_l_2/002-1156069-5604805?ie=UTF8&node=201590011&no=3435361&me=A36L942TSJ2AJA|Amazon
EC2]]
@@ -242, +238 @@

    * Using HDFS for large archival data storage
  
   * [[http://www.psgtech.edu/|PSG Tech, Coimbatore, India]]
-   * Multiple alignment of protein sequences helps to determine evolutionary linkages and
to predict molecular structures. The dynamic nature of the algorithm coupled with data and
compute parallelism of hadoop data grids improves the accuracy and speed of sequence alignment.
Parallelism at the sequence and block level reduces the time complexity of MSA problems. Scalable
nature of Hadoop makes it apt to solve large scale alignment problems. 
+   * Multiple alignment of protein sequences helps to determine evolutionary linkages and
to predict molecular structures. The dynamic nature of the algorithm coupled with data and
compute parallelism of hadoop data grids improves the accuracy and speed of sequence alignment.
Parallelism at the sequence and block level reduces the time complexity of MSA problems. Scalable
nature of Hadoop makes it apt to solve large scale alignment problems.
    * Our cluster size varies from 5 to 10 nodes. Cluster nodes vary from 2950 Quad Core 
Rack Server,  with 2x6MB Cache and 4 x 500 GB SATA Hard Drive to E7200 / E7400 processors
with 4 GB RAM and 160 GB HDD.
-  
+ 
   * [[http://www.quantcast.com/|Quantcast]]
    * 3000 cores, 3500TB. 1PB+ processing each day.
    * Hadoop scheduler with fully custom data path / sorter
@@ -299, +295 @@

    We use Hadoop to facilitate information retrieval research & experimentation, particularly
for TREC, using the Terrier IR platform. The open source release of [[http://ir.dcs.gla.ac.uk/terrier/|Terrier]]
includes large-scale distributed indexing using Hadoop Map Reduce.
  
   * [[http://www.umiacs.umd.edu/~jimmylin/cloud-computing/index.html|University of Maryland]]
+   . We are one of six universities participating in IBM/Google's academic cloud computing
initiative.  Ongoing research and teaching efforts include projects in machine translation,
language modeling, bioinformatics, email analysis, and image processing.
-    We are one of six universities participating in IBM/Google's academic
-    cloud computing initiative.  Ongoing research and teaching efforts
-    include projects in machine translation, language modeling,
-    bioinformatics, email analysis, and image processing.
  
   * [[http://t2.unl.edu|University of Nebraska Lincoln, Research Computing Facility]]
+   . We currently run one medium-sized Hadoop cluster (200TB) to store and serve up physics
data for the computing portion of the Compact Muon Solenoid (CMS) experiment.  This requires
a filesystem which can download data at multiple Gbps and process data at an even higher rate
locally.  Additionally, several of our students are involved in research projects on Hadoop.
-    We currently run one medium-sized Hadoop cluster (200TB) to store and serve up physics
data
-    for the computing portion of the Compact Muon Solenoid (CMS) experiment.  This requires
a filesystem
-    which can download data at multiple Gbps and process data at an even higher rate locally.
 Additionally,
-    several of our students are involved in research projects on Hadoop.
  
   * [[http://www.veoh.com|Veoh]]
    * We use a small Hadoop cluster to reduce usage data for internal metrics, for search
indexing and for recommendation data.
@@ -318, +308 @@

   * [[http://www.vksolutions.com/|VK Solutions]]
    * We use a small Hadoop cluster in the scope of our general research activities at [[http://www.vklabs.com|VK
Labs]] to get a faster data access from web applications.
    * We also use Hadoop for filtering and indexing listing, processing log analysis, and
for recommendation data.
- 
  
   * [[http://devuelosbaratos.es/|Vuelos baratos]]
    * We use a small Hadoop
@@ -334, +323 @@

   * [[http://www.yahoo.com/|Yahoo!]]
    * More than 100,000 CPUs in >25,000 computers running Hadoop
    * Our biggest cluster: 4000 nodes (2*4cpu boxes w 4*1TB disk & 16GB RAM)
-      * Used to support research for Ad Systems and Web Search
+    * Used to support research for Ad Systems and Web Search
-      * Also used to do scaling tests to support development of Hadoop on larger clusters
+    * Also used to do scaling tests to support development of Hadoop on larger clusters
    * [[http://developer.yahoo.com/blogs/hadoop|Our Blog]] - Learn more about how we use Hadoop.
    * >40% of Hadoop Jobs within Yahoo are Pig jobs.
  
@@ -343, +332 @@

    * 10 node cluster (Dual-Core AMD Opteron 2210, 4GB RAM, 1TB/node storage)
    * Run Naive Bayes classifiers in parallel over crawl data to discover event information
  
- 
  ''When applicable, please include details about your cluster hardware and size.''
  

Mime
View raw message