hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Trivial Update of "Hbase/NewFileFormat" by stack
Date Fri, 03 Oct 2008 19:33:52 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by stack:
http://wiki.apache.org/hadoop/Hbase/NewFileFormat

------------------------------------------------------------------------------
  
  If index included offset to every key, would be able to use it to figure if file had an
entry for the queried key and every index lookup would get us exact offset.  But such an index
would be too large to keep in memory (If values are small, file could have many entries. 
Files are usually about 64MB but can grow to an upper-bound of about 1G though this is configurable
and nothing to stop it being configured up from this).
  
+ == New Format ==
+  * [https://issues.apache.org/jira/browse/HBASE-647 HBASE-647]: Have data, metadata, indices
and bloomfilters, etc., all rolled up in the one file.  Could do this with [https://issues.apache.org/jira/browse/HADOOP-3315
TFile].
+ 
  == Other File Formats ==
  Cassandra uses a Sequence File.  It adds key/values in blocks of 128 by default.  On the
128th entry, an index for the block keys is inlined and then a new block begins.  Block offsets
are kept out in an index file as in MapFile.  Bloomfilters are on by default.
  
- == New Format ==
- Have data, metadata, indices and bloomfilters, etc., all rolled up in the one file.
- 

Mime
View raw message