hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "PoweredBy" by SteveLoughran
Date Tue, 06 Dec 2011 11:04:33 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "PoweredBy" page has been changed by SteveLoughran:
http://wiki.apache.org/hadoop/PoweredBy?action=diff&rev1=381&rev2=382

Comment:
rm some linkspam, review spelling and text

    * ''Our production cluster has been running since Oct 2008. ''
  
   * ''[[http://www.adyard.de|adyard]] ''
-   * ''We use Flume, Hadoop and Pig for log storage and report generation aswell as ad-Targeting.
''
+   * ''We use Flume, Hadoop and Pig for log storage and report generation as well as ad-Targeting.
''
    * ''We currently have 12 nodes running HDFS and Pig and plan to add more from time to
time. ''
    * ''50% of our recommender system is pure Pig because of it's ease of use. ''
-   * ''Some of our more deeply-integrated tasks are using the streaming api and ruby aswell
as the excellent Wukong-Library. ''
+   * ''Some of our more deeply-integrated tasks are using the streaming API and ruby as well
as the excellent Wukong-Library. ''
  
   * ''[[http://www.ablegrape.com/|Able Grape]] - Vertical search engine for trustworthy wine
information ''
-   * ''We have one of the world's smaller hadoop clusters (2 nodes @ 8 CPUs/node) ''
+   * ''We have one of the world's smaller Hadoop clusters (2 nodes @ 8 CPUs/node) ''
    * ''Hadoop and Nutch used to analyze and index textual information ''
  
   * ''[[http://adknowledge.com/|Adknowledge]] - Ad network ''
@@ -49, +49 @@

    * ''Each node has 8 cores, 16G RAM and 1.4T storage. ''
  
   * ''[[http://aol.com/|AOL]] ''
-   * ''We use hadoop for variety of things ranging from ETL style processing and statistics
generation to running advanced algorithms for doing behavioral analysis and targeting. ''
+   * ''We use Hadoop for variety of things ranging from ETL style processing and statistics
generation to running advanced algorithms for doing behavioral analysis and targeting. ''
    * ''The Cluster that we use for mainly behavioral analysis and targeting has 150 machines,
Intel Xeon, dual processors, dual core, each with 16GB Ram and 800 GB hard-disk. ''
  
   * ''[[http://www.ara.com.tr/|ARA.COM.TR]] - Ara Com Tr - Turkey's first and only search
engine ''
@@ -59, +59 @@

    * ''Our clusters vary from 10 to 100 nodes ''
  
   * ''[[http://atbrox.com/|Atbrox]] ''
-   * ''We use hadoop for information extraction & search, and data analysis consulting
''
+   * ''We use Hadoop for information extraction & search, and data analysis consulting
''
    * ''Cluster: we primarily use Amazon's Elastic MapReduce ''
  
  = B =
@@ -75, +75 @@

  
   * ''[[http://www.beebler.com|Beebler]] ''
    * ''14 node cluster (each node has: 2 dual core CPUs, 2TB storage, 8GB RAM) ''
-   * ''We use hadoop for matching dating profiles ''
+   * ''We use Hadoop for matching dating profiles ''
  
   * ''[[http://www.benipaltechnologies.com|Benipal Technologies]] - Outsourcing, Consulting,
Innovation ''
    * ''35 Node Cluster (Core2Quad Q9400 Processor, 4-8 GB RAM, 500 GB HDD) ''
@@ -150, +150 @@

  
   * ''[[http://www.deepdyve.com|Deepdyve]] ''
    * ''Elastic cluster with 5-80 nodes ''
-   * ''We use hadoop to create our indexes of deep web content and to provide a high availability
and high bandwidth storage service for index shards for our search cluster. ''
+   * ''We use Hadoop to create our indexes of deep web content and to provide a high availability
and high bandwidth storage service for index shards for our search cluster. ''
  
   * ''[[http://www.wirtschaftsdetektei-berlin.de|Detektei Berlin]] ''
    * ''We are using Hadoop in our data mining and multimedia/internet research groups. ''
    * ''3 node cluster with 48 cores in total, 4GB RAM and 1 TB storage each. ''
  
   * ''[[http://search.detik.com|Detikcom]] - Indonesia's largest news portal ''
-   * ''We use hadoop, pig and hbase to analyze search log, generate Most View News, generate
top wordcloud, and analyze all of our logs ''
+   * ''We use Hadoop, pig and HBase to analyze search log, generate Most View News, generate
top wordcloud, and analyze all of our logs ''
    * ''Currently We use 9 nodes ''
  
   * ''[[http://www.devdaily.com|devdaily.com]] ''
@@ -209, +209 @@

    * ''Currently we have 2 major clusters:    * A 1100-machine cluster with 8800 cores and
about 12 PB raw storage. ''
     * ''A 300-machine cluster with 2400 cores and about 3 PB raw storage. ''
     * ''Each (commodity) node has 8 cores and 12 TB of storage. ''
-    * ''We are heavy users of both streaming as well as the Java apis. We have built a higher
level data warehousing framework using these features called Hive (see the http://hadoop.apache.org/hive/).
We have also developed a FUSE implementation over hdfs. ''
+    * ''We are heavy users of both streaming as well as the Java APIs. We have built a higher
level data warehousing framework using these features called Hive (see the http://hadoop.apache.org/hive/).
We have also developed a FUSE implementation over HDFS. ''
  
   * ''[[http://www.foxaudiencenetwork.com|FOX Audience Network]] ''
    * ''40 machine cluster (8 cores/machine, 2TB/machine storage) ''
@@ -227, +227 @@

    * ''Machine learning ''
  
   * ''[[http://freestylers.jp/|Freestylers]] - Image retrieval engine ''
-   * ''[[http://www.kralarabaoyunlari.com|Araba oyunları]] - Araba oyunları ''
-   * [[http://www.pepe-izle.gen.tr/|Pepe izle]] - Pepe izle
-   * [[http://www.scratchcardportal.com|scratch cards]] -Scratch Cards
-   * ''We Japanese company Freestylers use Hadoop to build the image processing environment
for image-based product recommendation system mainly on Amazon EC2, from April 2009. ''
+   * ''We, the Japanese company Freestylers, use Hadoop to build the image processing environment
for image-based product recommendation system mainly on Amazon EC2, from April 2009. ''
    * ''Our Hadoop environment produces the original database for fast access from our web
application. ''
    * ''We also uses Hadoop to analyzing similarities of user's behavior. ''
  
@@ -260, +257 @@

  
  = H =
   * ''[[http://www.hadoop.co.kr/|Hadoop Korean User Group]], a Korean Local Community Team
Page. ''
-   * ''50 node cluster In the Korea university network environment.    * Pentium 4 PC, HDFS
4TB Storage ''
+   * ''50 node cluster In the Korea university network environment.
+   * Pentium 4 PC, HDFS 4TB Storage ''
  
-  * ''Used for development projects    * Retrieving and Analyzing Biomedical Knowledge ''
+  * ''Used for development projects
+   * Retrieving and Analyzing Biomedical Knowledge ''
    * ''Latent Semantic Analysis, Collaborative Filtering ''
  
   * ''[[http://www.hotelsandaccommodation.com.au/|Hotels & Accommodation]] ''
@@ -373, +372 @@

     * ''120 Nehalem-based Sun x4275, with 2x4 cores, 24GB RAM, 8x1TB SATA ''
     * ''580 Westmere-based HP SL 170x, with 2x4 cores, 24GB RAM, 6x2TB SATA ''
     * ''1200 Westmere-based SuperMicro X8DTT-H, with 2x6 cores, 24GB RAM, 6x2TB SATA ''
+    * ''Software:
-    * ''Software:     * CentOS 5.5 -> RHEL 6.1 ''
+     * CentOS 5.5 -> RHEL 6.1 ''
      * ''Sun JDK 1.6.0_14 -> Sun JDK 1.6.0_20 -> Sun JDK 1.6.0_26 ''
      * ''Apache Hadoop 0.20.2+patches -> Apache Hadoop 0.20.204+patches ''
      * ''Pig 0.9 heavily customized ''
@@ -407, +407 @@

    * ''Use a mix of Java, Pig and Hive. ''
  
   * ''[[http://www.memonews.com/en//|MeMo News - Online and Social Media Monitoring]] ''
-   * ''we use hadoop ''
+   * ''we use Hadoop ''
-    * ''as plattform for distributed crawling ''
+    * ''as platform for distributed crawling ''
     * ''to store and process unstructured data, such as news and social media (Hadoop, PIG,
MapRed and HBase) ''
     * ''log file aggregation and processing (Flume) ''
  
   * ''[[http://www.mercadolibre.com//|Mercadolibre.com]] ''
    * ''20 nodes cluster (12 * 20 cores, 32GB, 53.3TB) ''
-   * ''Custemers log on on-line apps ''
+   * ''Customers log on on-line apps ''
    * ''Operations log processing ''
    * ''Use java, pig, hive, oozie ''
  
   * ''[[http://www.mobileanalytics.tv//|MobileAnalytic.TV]] ''
    * ''We use Hadoop to develop MapReduce algorithms: ''
-    * ''Information retrival and analytics ''
+    * ''Information retrieval and analytics ''
     * ''Machine generated content - documents, text, audio, & video ''
     * ''Natural Language Processing ''
    * ''Project portfolio includes:    * Natural Language Processing ''
@@ -464, +464 @@

  = O =
   * ''[[http://www.optivo.com|optivo]] - Email marketing software ''
    * ''We use Hadoop to aggregate and analyse email campaigns and user interactions. ''
-   * ''Developement is based on the github repository. ''
+   * ''Development is based on the github repository. ''
  
  = P =
   * ''[[http://papertrailapp.com/|Papertrail]] - Hosted syslog and app log management ''
@@ -500, +500 @@

    * ''We use Hadoop for analyzing poker players game history and generating gameplay related
players statistics ''
  
   * ''[[http://www.portabilite.info|Portabilité]] ''
-   * ''50 node cluster in Colo. ''
+   * ''50 node cluster in a colocated site. ''
-   * ''Also used as a proof of concept cluster for a cloud based ERP syste. ''
+   * ''Also used as a proof of concept cluster for a cloud based ERP system. ''
  
   * ''[[http://www.psgtech.edu/|PSG Tech, Coimbatore, India]] ''
-   * ''Multiple alignment of protein sequences helps to determine evolutionary linkages and
to predict molecular structures. The dynamic nature of the algorithm coupled with data and
compute parallelism of hadoop data grids improves the accuracy and speed of sequence alignment.
Parallelism at the sequence and block level reduces the time complexity of MSA problems. Scalable
nature of Hadoop makes it apt to solve large scale alignment problems. ''
+   * ''Multiple alignment of protein sequences helps to determine evolutionary linkages and
to predict molecular structures. The dynamic nature of the algorithm coupled with data and
compute parallelism of Hadoop data grids improves the accuracy and speed of sequence alignment.
Parallelism at the sequence and block level reduces the time complexity of MSA problems. The
scalable nature of Hadoop makes it apt to solve large scale alignment problems. ''
    * ''Our cluster size varies from 5 to 10 nodes. Cluster nodes vary from 2950 Quad Core
Rack Server, with 2x6MB Cache and 4 x 500 GB SATA Hard Drive to E7200 / E7400 processors with
4 GB RAM and 160 GB HDD. ''
  
  = Q =
@@ -524, +524 @@

  
   * ''[[http://www.rapleaf.com/|Rapleaf]] ''
    * ''80 node cluster (each node has: 2 quad core CPUs, 4TB storage, 16GB RAM) ''
-   * ''We use hadoop to process data relating to people on the web ''
+   * ''We use Hadoop to process data relating to people on the web ''
    * ''We also involved with Cascading to help simplify how our data flows through various
processing stages ''
  
   * ''[[http://www.recruit.jp/corporate/english/|Recruit]] ''
@@ -544, +544 @@

  
   * ''[[http://www.rightnow.com/|RightNow Technologies]] - Powering Great Experiences ''
    * ''16 node cluster (each node has: 2 quad core CPUs, 6TB storage, 24GB RAM) ''
-   * ''We use hadoop for log and usage analysis ''
+   * ''We use Hadoop for log and usage analysis ''
    * ''We predominantly leverage Hive and HUE for data access ''
  
   * ''[[http://www.rubbelloselotto.de/|Rubbellose]] ''
@@ -555, +555 @@

    * ''SARA has initiated a Proof-of-Concept project to evaluate the Hadoop software stack
for scientific use. ''
  
   * ''[[http://alpha.search.wikia.com|Search Wikia]] ''
-   * ''A project to help develop open source social search tools. We run a 125 node hadoop
cluster. ''
+   * ''A project to help develop open source social search tools. We run a 125 node Hadoop
cluster. ''
  
   * ''[[http://wwwse.inf.tu-dresden.de/SEDNS/SEDNS_home.html|SEDNS]] - Security Enhanced
DNS Group ''
    * ''We are gathering world wide DNS data in order to discover content distribution networks
and configuration issues utilizing Hadoop DFS and MapRed. ''
@@ -628, +628 @@

    * ''We use Hadoop for log analysis. ''
  
   * ''[[http://www.tubemogul.com|TubeMogul]] ''
-   * ''We use Hadoop HDFS, Map/Reduce, Hive and Hbase ''
+   * ''We use Hadoop HDFS, Map/Reduce, Hive and HBase ''
  
    * ''We manage over 300 TB of HDFS data across four Amazon EC2 Availability Zone ''
  
@@ -640, +640 @@

    * ''We use both Scala and Java to access Hadoop's MapReduce APIs ''
    * ''We use Pig heavily for both scheduled and ad-hoc jobs, due to its ability to accomplish
a lot with few statements. ''
    * ''We employ committers on Pig, Avro, Hive, and Cassandra, and contribute much of our
internal Hadoop work to opensource (see [[http://github.com/kevinweil/hadoop-lzo|hadoop-lzo]])
''
-   * ''For more on our use of hadoop, see the following presentations: [[http://www.slideshare.net/kevinweil/hadoop-pig-and-twitter-nosql-east-2009|Hadoop
and Pig at Twitter]] and [[http://www.slideshare.net/kevinweil/protocol-buffers-and-hadoop-at-twitter|Protocol
Buffers and Hadoop at Twitter]] ''
+   * ''For more on our use of Hadoop, see the following presentations: [[http://www.slideshare.net/kevinweil/hadoop-pig-and-twitter-nosql-east-2009|Hadoop
and Pig at Twitter]] and [[http://www.slideshare.net/kevinweil/protocol-buffers-and-hadoop-at-twitter|Protocol
Buffers and Hadoop at Twitter]] ''
  
   * ''[[http://tynt.com|Tynt]] ''
    * ''We use Hadoop to assemble web publishers' summaries of what users are copying from
their websites, and to analyze user engagement on the web. ''
@@ -686, +686 @@

   * ''[[http://www.webmastersitesi.com|Webmaster Site]] ''
    * ''We use Hadoop for our webmaster tools. It allows us to store, index, search data in
a much fast way. We also use it for logs analysis and trends prediction.''
    * ''4 node cluster (each node has: 4 core AMD CPUs, 2TB storage, 32GB RAM)''
-   * ''We use hadoop to process log data and perform on-demand analytics as well''
+   * ''We use Hadoop to process log data and perform on-demand analytics as well''
   * ''[[http://www.worldlingo.com/|WorldLingo]] ''
    * ''Hardware: 44 servers (each server has: 2 dual core CPUs, 2TB storage, 8GB RAM) ''
    * ''Each server runs Xen with one Hadoop/HBase instance and another instance with web
or application servers, giving us 88 usable virtual machines. ''

Mime
View raw message