hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Trivial Update of "Hbase/PerformanceEvaluation" by stack
Date Fri, 21 Dec 2007 05:20:15 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by stack:
http://wiki.apache.org/lucene-hadoop/Hbase/PerformanceEvaluation

The comment on the change is:
Notes on mapfile numbers.

------------------------------------------------------------------------------
  I've also added numbers for sequential writes, random and next ('scan') reads into and out
of a single *open* HDFS mapfile for comparison: i.e. random reading, we are not opening the
file each time and the mapfile index is loaded into memory.  Going by current numbers, pure
mapfile writes are slower than the numbers google posted in initial bigtable paper and reads
just a bit faster (except when scanning).  GFS must be fast.
  
  ||<rowbgcolor="#ececec">Experiment Run||HBase20070708||HBase20070916||0.15.0||20071219||mapfile||!BigTable||
- ||random reads ||68||272||264||167||1718||1212||
+ ||random reads ||68||272||264||167||685||1212||
  ||random reads (mem)||Not implemented||Not implemented||Not implemented||Not Implemented||-||10811||
  ||random writes||847||1460||1277||1400||-||8850||
  ||sequential reads||301||267||305||138||-||4425||
- ||sequential writes||850||1278||1112||1691||5761||8547||
+ ||sequential writes||850||1278||1112||1691||5494||8547||
- ||scans||3063||3692||3758||3731||28886||15385||
+ ||scans||3063||3692||3758||3731||25641||15385||
  
+ Subsequently I profiled the mapfile PerformanceEvaluation.  Turns out generation of the
values and keys to insert were taking a bunch of CPU time. After making a fix key and value
generations were between 15-25% (the alternative was precompiling keys and values which would
take loads of memory).  Rerunning tests, it looks like there can be a pretty broad range of
fluctuation in mapfile numbers between runs.  I also noticed that the 0.15.x random reads
seem to be 50% faster than TRUNK.  Investigate.
+ 

Mime
View raw message