hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Trivial Update of "Hbase/PoweredBy" by udanax
Date Wed, 16 Feb 2011 04:22:06 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Hbase/PoweredBy" page has been changed by udanax.
The comment on this change is: add more info.
http://wiki.apache.org/hadoop/Hbase/PoweredBy?action=diff&rev1=57&rev2=58

--------------------------------------------------

  
  [[http://www.twitter.com|Twitter]] runs HBase across its entire Hadoop cluster.  HBase provides
a distributed, read/write backup of all  mysql tables in Twitter's production backend, allowing
engineers to run MapReduce jobs over the data while maintaining the ability to apply periodic
row updates (something that is more difficult to do with vanilla HDFS).  A number of applications
including people search rely on HBase internally for data generation. Additionally, the operations
team uses HBase as a timeseries database for cluster-wide monitoring/performance data.
  
- [[http://www.udanax.org|Udanax.org]] (URL shortener) use HBase cluster to store URLs, Web
Log data and response the real-time request on its Web Server. This application is now used
for some twitter clients and a number of web sites and the rows are increasing as almost 30
per second.
+ [[http://www.udanax.org|Udanax.org]] (URL shortener) use 10 nodes HBase cluster to store
URLs, Web Log data and response the real-time request on its Web Server. This application
is now used for some twitter clients and a number of web sites. Currently API requests are
almost 30 per second and web redirection requests are about 300 per second.
  
  [[http://www.veoh.com/|Veoh Networks]] uses HBase to store and process visitor(human) and
entity(non-human) profiles which are used for behavioral targeting, demographic detection,
and personalization services.  Our site reads this data in real-time (heavily cached) and
submits updates via various batch map/reduce jobs. With 25 million unique visitors a month
storing this data in a traditional RDBMS is not an option. We currently have a 24 node Hadoop/HBase
cluster and our profiling system is sharing this cluster with our other Hadoop data pipeline
processes.
  

Mime
View raw message