hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hbase/PoweredBy" by StevenNoels
Date Thu, 11 Nov 2010 09:47:31 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Hbase/PoweredBy" page has been changed by StevenNoels.
http://wiki.apache.org/hadoop/Hbase/PoweredBy?action=diff&rev1=49&rev2=50

--------------------------------------------------

  
  [[http://www.drawntoscaleconsulting.com|Drawn to Scale Consulting]] consults on HBase, Hadoop,
Distributed Search, and Scalable architectures.
  
- [[http://www.filmweb.pl|Filmweb]] is a film web portal with a large dataset of films, persons
and movie-related entities. We have just started a small cluster of 3 HBase nodes to handle
our web cache persistency layer. We plan to increase the cluster size, and also to
start migrating some of the data from our databases which have some demanding scalability
requirements.  
+ [[http://www.filmweb.pl|Filmweb]] is a film web portal with a large dataset of films, persons
and movie-related entities. We have just started a small cluster of 3 HBase nodes to handle
our web cache persistency layer. We plan to increase the cluster size, and also to start migrating
some of the data from our databases which have some demanding scalability requirements.
  
  [[http://www.flurry.com|Flurry]] provides mobile application analytics.  We use HBase and
Hadoop for all of our analytics processing, and serve all of our live requests directly out
of HBase on our 50 node production cluster with tens of billions of rows over several tables.
  
@@ -12, +12 @@

  
  [[http://www.kalooga.com|Kalooga]] is a discovery service for image galleries. We use Hadoop,
Hbase, Chukwa and Pig on a 20-node cluster for our crawling, analysis and events processing.
  
- [[http://www.lilycms.org|Lily]] is an open source content repository backed by HBase and
SOLR from Outerthought - scalable content applications.
+ [[http://www.lilyproject.org|Lily]] is an open source content repository, backed by HBase
and SOLR from Outerthought - scalable content applications.
  
  [[http://www.mahalo.com|Mahalo]], "...the world's first human-powered search engine". All
the markup that powers the wiki is stored in HBase. It's been in use for a few months now.
!MediaWiki - the same software that power Wikipedia - has version/revision control. Mahalo's
in-house editors produce a lot of revisions per day, which was not working well in a RDBMS.
An hbase-based solution for this was built and tested, and the data migrated out of MySQL
and into HBase. Right now it's at something like 6 million items in HBase. The upload tool
runs every hour from a shell script to back up that data, and on 6 nodes takes about 5-10
minutes to run - and does not slow down production at all.
  
@@ -24, +24 @@

  
  [[http://www.powerset.com/|Powerset (a Microsoft company)]] uses HBase to store raw documents.
 We have a ~110 node hadoop cluster running DFS, mapreduce, and hbase.  In our wikipedia hbase
table, we have one row for each wikipedia page (~2.5M pages and climbing).  We use this as
input to our indexing jobs, which are run in hadoop mapreduce.  Uploading the entire wikipedia
dump to our cluster takes a couple hours.  Scanning the table inside mapreduce is very fast
-- the latency is in the noise compared to everything else we do.
  
- [[http://www.readpath.com/|ReadPath]] uses HBase to store several hundred million RSS items
and dictionary for its RSS newsreader. Readpath is currently running on an 8 node cluster.

+ [[http://www.readpath.com/|ReadPath]] uses HBase to store several hundred million RSS items
and dictionary for its RSS newsreader. Readpath is currently running on an 8 node cluster.
  
  [[http://www.runa.com/|Runa Inc.]] offers a SaaS that enables online merchants to offer
dynamic per-consumer, per-product promotions embedded in their website. To implement this
we collect the click streams of all their visitors to determine along with the rules of the
merchant what promotion to offer the visitor at different points of their browsing the Merchant
website. So we have lots of data and have to do lots of off-line and real-time analytics.
HBase is the core for us. We also use Clojure and our own open sourced distributed processing
framework, Swarmiji. The HBase Community has been key to our forward movement with HBase.
We're looking for experienced developers to join us to help make things go even faster!
  

Mime
View raw message