hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hbase/PoweredBy" by AbeTaha
Date Tue, 20 Oct 2009 22:21:13 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Hbase/PoweredBy" page has been changed by AbeTaha.
http://wiki.apache.org/hadoop/Hbase/PoweredBy?action=diff&rev1=35&rev2=36

--------------------------------------------------

  
  [[http://www.flurry.com|Flurry]] provides mobile application analytics.  We use HBase and
Hadoop for all of our analytics processing, and serve all of our live requests directly out
of HBase on our 16-node production cluster with billions of rows over several tables.
  
- [[http://www.drawntoscaleconsulting.com|Drawn to Scale Consulting]] consults on HBase, Hadoop,
Distributed Search, and Scalable architectures. 
+ [[http://www.drawntoscaleconsulting.com|Drawn to Scale Consulting]] consults on HBase, Hadoop,
Distributed Search, and Scalable architectures.
  
  [[http://gumgum.com|GumGum]] is an analytics and monetization platform for online content.
We've developed usage-based licensing models that make the best content in the world accessible
to publishers of all sizes.  We use HBase 0.20.0 on a 4-node Amazon EC2 cluster to record
visits to advertisers in our ad network. Our production cluster has been running since July
2009.
  
- [[http://www.mahalo.com|Mahalo]], "...the world's first human-powered search engine". All
the markup that powers the wiki is stored in HBase. It's been in use for a few months now.
!MediaWiki - the same software that power Wikipedia - has version/revision control. Mahalo's
in-house editors produce a lot of revisions per day, which was not working well in a RDBMS.
An hbase-based solution for this was built and tested, and the data migrated out of MySQL
and into HBase. Right now it's at something like 6 million items in HBase. The upload tool
runs every hour from a shell script to back up that data, and on 6 nodes takes about 5-10
minutes to run - and does not slow down production at all. 
+ [[http://www.mahalo.com|Mahalo]], "...the world's first human-powered search engine". All
the markup that powers the wiki is stored in HBase. It's been in use for a few months now.
!MediaWiki - the same software that power Wikipedia - has version/revision control. Mahalo's
in-house editors produce a lot of revisions per day, which was not working well in a RDBMS.
An hbase-based solution for this was built and tested, and the data migrated out of MySQL
and into HBase. Right now it's at something like 6 million items in HBase. The upload tool
runs every hour from a shell script to back up that data, and on 6 nodes takes about 5-10
minutes to run - and does not slow down production at all.
  
  [[http://www.meetup.com|Meetup]] is on a mission to help the world’s people self-organize
into local groups.  We use Hadoop and HBase to power a site-wide, real-time activity feed
system for all of our members and groups.  Group activity is written directly to HBase, and
indexed per member, with the member's custom feed served directly from HBase for incoming
requests.  We're running HBase 0.20.0 on a 11 node cluster.
+ 
+ [[http://ning.com|Ning]] uses HBase to store and serve the results of processing user events
and log files, which allows us to provide near-real time analytics and reporting. We use a
small cluster of commodity machines with 4 cores and 16GB of RAM per machine to handle all
our analytics and reporting needs.
  
  [[http://www.openplaces.org|Openplaces]] is a search engine for travel that uses HBase to
store terabytes of web pages and travel-related entity records (countries, cities, hotels,
etc.). We have dozens of MapReduce jobs that crunch data on a daily basis.  We use a 20-node
cluster for development, a 40-node cluster for offline production processing and an EC2 cluster
for the live web site.
  
@@ -20, +22 @@

  
  [[http://www.streamy.com/|Streamy]] is a recently launched realtime social news site.  We
use HBase for all of our data storage, query, and analysis needs, replacing an existing SQL-based
system.  This includes hundreds of millions of documents, sparse matrices, logs, and everything
else once done in the relational system.  We perform significant in-memory caching of query
results similar to a traditional Memcached/SQL setup as well as other external components
to perform joining and sorting.  We also run thousands of daily MapReduce jobs using HBase
tables for log analysis, attention data processing, and feed crawling.  HBase has helped us
scale and distribute in ways we could not otherwise, and the community has provided consistent
and invaluable assistance.
  
- [[http://www.stumbleupon.com/|Stumbleupon]] and [[http://su.pr|Su.pr]] use HBase as a real
time data storage and analytics platform. Serving directly out of HBase, various site features
and statistics are kept up to date in a real time fashion. We also use HBase a map-reduce
data source to overcome traditional query speed limits in MySQL. 
+ [[http://www.stumbleupon.com/|Stumbleupon]] and [[http://su.pr|Su.pr]] use HBase as a real
time data storage and analytics platform. Serving directly out of HBase, various site features
and statistics are kept up to date in a real time fashion. We also use HBase a map-reduce
data source to overcome traditional query speed limits in MySQL.
  
  [[http://www.subrecord.org|SubRecord Project]] is an Open Source project that is using HBase
as a repository of records (persisted map-like data) for the aspects it provides like logging,
tracing or metrics. HBase and Lucene index both constitute a repo/storage for this platform.
  
@@ -32, +34 @@

  
  [[http://www.videosurf.com/|VideoSurf]] - "The video search engine that has taught computers
to see". We're using Hbase to persist various large graphs of data and other statistics. Hbase
was a real win for us because it let us store substantially larger datasets without the need
for manually partitioning the data and it's column-oriented nature allowed us to create schemas
that were substantially more efficient for storing and retrieving data.
  
- [[http://www.visibletechnologies.com/|Visible Technologies]] - We use Hadoop, HBase, Katta,
and more to collect, parse, store, and search hundreds of millions of Social Media content.
We get incredibly fast throughput and very low latency on commodity hardware. HBase enables
our business to exist. 
+ [[http://www.visibletechnologies.com/|Visible Technologies]] - We use Hadoop, HBase, Katta,
and more to collect, parse, store, and search hundreds of millions of Social Media content.
We get incredibly fast throughput and very low latency on commodity hardware. HBase enables
our business to exist.
  
  [[http://www.worldlingo.com/|WorldLingo]] - The !WorldLingo Multilingual Archive. We use
HBase to store millions of documents that we scan using Map/Reduce jobs to machine translate
them into all or selected target languages from our set of available machine translation languages.
We currently store 12 million documents but plan to eventually reach the 450 million mark.
HBase allows us to scale out as we need to grow our storage capacities. Combined with Hadoop
to keep the data replicated and therefore fail-safe we have the backbone our service can rely
on now and in the future. !WorldLingo is using HBase since December 2007 and is along with
a few others one of the longest running HBase installation. Currently we are running the latest
HBase 0.20 and serving directly from it: [[http://www.worldlingo.com/ma/enwiki/en/HBase|MultilingualArchive]].
  

Mime
View raw message