hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hbase/PoweredBy" by MichaelBieniosek
Date Wed, 01 Oct 2008 21:29:57 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by MichaelBieniosek:
http://wiki.apache.org/hadoop/Hbase/PoweredBy

------------------------------------------------------------------------------
  
  [http://www.videosurf.com/ VideoSurf] - "The video search engine that has taught computers
to see". We're using Hbase to persist various large graphs of data and other statistics. Hbase
was a real win for us because it let us store substantially larger datasets without the need
for manually partitioning the data and it's column-oriented nature allowed us to create schemas
that were substantially more efficient for storing and retrieving data.
  
+ [http://www.powerset.com/ Powerset (a Microsoft company)] uses HBase to store raw documents.
 We have a ~70 node hadoop cluster running DFS, mapreduce, and hbase.  In our wikipedia hbase
table, we have one row for each wikipedia page (~2.5M pages and climbing).  We use this as
input to our indexing jobs, which are run in hadoop mapreduce.  Uploading the entire wikipedia
dump to our cluster takes a couple hours.  Scanning the table inside mapreduce is very fast
-- the latency is in the noise compared to everything else we do.
+ 

Mime
View raw message