hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hbase/PoweredBy" by RyanLynch
Date Tue, 21 Apr 2009 19:05:52 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by RyanLynch:

  [http://trendmicro.com/ Trend Micro] Advanced Threats Research is running Hadoop 0.18.1
and HBase 0.18.0. Our application is a web crawling application with concurrent batch content
analysis of various kinds. All of the workflow components are implemented as subclasses of
!TableMap and/or !TableReduce on a cluster of 25 nodes. We see a constant rate of 2500 requests/sec
or greater, peaking periodically near 100K/sec when some of the batch scan tasks run.
+ [http://www.veoh.com/ Veoh Networks] uses HBase to store and process visitor(human) and
entity(non-human) profiles which are used for behavioral targeting, demographic detection,
and personalization services.  Our site reads this data in real-time (heavily cached) and
submits updates via various batch map/reduce jobs. With 25 million unique visitors a month
storing this data in a traditional RDBMS is not an option. We currently have a 24 node Hadoop/HBase
cluster and our profiling system is sharing this cluster with our other Hadoop data pipeline
  [http://www.videosurf.com/ VideoSurf] - "The video search engine that has taught computers
to see". We're using Hbase to persist various large graphs of data and other statistics. Hbase
was a real win for us because it let us store substantially larger datasets without the need
for manually partitioning the data and it's column-oriented nature allowed us to create schemas
that were substantially more efficient for storing and retrieving data.
  [http://www.wikia.com/wiki/Wikia Wikia] hosts its user and keyword databases on a cluster
of 7 machines.

View raw message