hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Trivial Update of "Hbase/PoweredBy" by OtisGospodnetic
Date Fri, 25 May 2012 21:54:04 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Hbase/PoweredBy" page has been changed by OtisGospodnetic:

Removed SubRecord project - it's dead

  [[http://www.stumbleupon.com/|Stumbleupon]] and [[http://su.pr|Su.pr]] use HBase as a real
time data storage and analytics platform. Serving directly out of HBase, various site features
and statistics are kept up to date in a real time fashion. We also use HBase a map-reduce
data source to overcome traditional query speed limits in MySQL.
- [[http://www.subrecord.org|SubRecord Project]] is an Open Source project that is using HBase
as a repository of records (persisted map-like data) for the aspects it provides like logging,
tracing or metrics. HBase and Lucene index both constitute a repo/storage for this platform.
  [[http://www.tokenizer.org|Shopping Engine at Tokenizer]] is a web crawler; it uses HBase
to store URLs and Outlinks (!AnchorText + LinkedURL): more than a billion. It was initially
designed as Nutch-Hadoop extension, then (due to very specific 'shopping' scenario) moved
to SOLR + MySQL(InnoDB) (ten thousands queries per second), and now - to HBase. HBase is significantly
faster due to: no need for huge transaction logs, column-oriented design exactly matches 'lazy'
business logic, data compression, !MapReduce support. Number of mutable 'indexes' (term from
RDBMS) significantly reduced due to the fact that each 'row::column' structure is physically
sorted by 'row'. MySQL InnoDB engine is best DB choice for highly-concurrent updates. However,
necessity to flash a block of data to harddrive even if we changed only few bytes is obvious
bottleneck. HBase greatly helps: not-so-popular in modern DBMS 'delete-insert', 'mutable primary
key', and 'natural primary key' patterns become a big advantage with HBase.
  [[http://traackr.com/|Traackr]] uses HBase to store and serve online influencer data in
real-time. We use MapReduce to frequently re-score our entire data set as we keep updating
influencer metrics on a daily basis.

View raw message