hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hbase/PoweredBy" by GaryHelmling
Date Tue, 08 Sep 2009 21:54:54 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by GaryHelmling:

  [http://gumgum.com GumGum] is an analytics and monetization platform for online content.
We've developed usage-based licensing models that make the best content in the world accessible
to publishers of all sizes.  We use HBase 0.20.0 on a 4-node Amazon EC2 cluster to record
visits to advertisers in our ad network. Our production cluster has been running since July
  [http://www.mahalo.com Mahalo], "...the world's first human-powered search engine". All
the markup that powers the wiki is stored in HBase. It's been in use for a few months now.
!MediaWiki - the same software that power Wikipedia - has version/revision control. Mahalo's
in-house editors produce a lot of revisions per day, which was not working well in a RDBMS.
An hbase-based solution for this was built and tested, and the data migrated out of MySQL
and into HBase. Right now it's at something like 6 million items in HBase. The upload tool
runs every hour from a shell script to back up that data, and on 6 nodes takes about 5-10
minutes to run - and does not slow down production at all. 
+ [http://www.meetup.com Meetup] is on a mission to help the world’s people self-organize
into local groups.  We use Hadoop and HBase to power a site-wide, real-time activity feed
system for all of our members and groups.  Group activity is written directly to HBase, and
indexed per member, with the member's custom feed served directly from HBase for incoming
requests.  We're running HBase 0.20.0 on a 11 node cluster.
  [http://www.openplaces.org Openplaces] is a search engine for travel that uses HBase to
store terabytes of web pages and travel-related entity records (countries, cities, hotels,
etc.). We have dozens of MapReduce jobs that crunch data on a daily basis.  We use a 20-node
cluster for development, a 40-node cluster for offline production processing and an EC2 cluster
for the live web site.

View raw message