hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hbase/PoweredBy" by RobertBerger
Date Wed, 28 Oct 2009 18:48:38 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Hbase/PoweredBy" page has been changed by RobertBerger.


  [[http://www.openplaces.org|Openplaces]] is a search engine for travel that uses HBase to
store terabytes of web pages and travel-related entity records (countries, cities, hotels,
etc.). We have dozens of MapReduce jobs that crunch data on a daily basis.  We use a 20-node
cluster for development, a 40-node cluster for offline production processing and an EC2 cluster
for the live web site.
  [[http://www.powerset.com/|Powerset (a Microsoft company)]] uses HBase to store raw documents.
 We have a ~110 node hadoop cluster running DFS, mapreduce, and hbase.  In our wikipedia hbase
table, we have one row for each wikipedia page (~2.5M pages and climbing).  We use this as
input to our indexing jobs, which are run in hadoop mapreduce.  Uploading the entire wikipedia
dump to our cluster takes a couple hours.  Scanning the table inside mapreduce is very fast
-- the latency is in the noise compared to everything else we do.
+ [[http://www.runa.com/|Runa Inc.]] offers a SaaS that enables online merchants to offer
dynamic per-consumer, per-product promotions embedded in their website. To implement this
we collect the click streams of all their visitors to determine along with the rules of the
merchant what promotion to offer the visitor at different points of their browsing the Merchant
website. So we have lots of data and have to do lots of off-line and real-time analytics.
HBase is the core for us. We also use Clojure and our own open sourced distributed processing
framework, Swarmiji. The HBase Community has been key to our forward movement with HBase.
We're looking for experienced developers to join us to help make things go even faster!
  [[http://www.socialmedia.com/|SocialMedia]] uses HBase to store and process user events
which allows us to provide near-realtime user metrics and reporting. HBase forms the heart
of our Advertising Network data storage and management system. We use HBase as a data source
and sink for both realtime request cycle queries and as a backend for mapreduce analysis.

View raw message