cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Cassandra Wiki] Update of "FileFormatDesignDoc" by StuHood
Date Mon, 02 May 2011 03:23:52 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.

The "FileFormatDesignDoc" page has been changed by StuHood.
The comment on this change is: Remove field-ordered section: not coming any time soon, and
arguably not beneficial in a column-family oriented store.
http://wiki.apache.org/cassandra/FileFormatDesignDoc?action=diff&rev1=35&rev2=36

--------------------------------------------------

  || 4.9 || 1 ||
  || china || 0 ||
  
+ The parent change flags can be represented compactly using a bitmap, and type information
can be stored using a byte per value.
- The parent change flag and type information can be represented compactly using a bitmap.
- 
- === Field reordering ===
- 
- ** NB: field reordering will likely not be implemented in initial versions of the format
**
- 
- One weakness of the implementation so far is that it preserves the order of tuples within
a level. This approach performs well for wide rows with high field cardinality, since adding
compression is unlikely to remove data.
- 
- But since we have domain knowledge that a compression algorithm would not, it will often
be more efficient to perform reordering by ourselves, particularly when a chunk has low cardinality:
for example at the "name2" level above. By assigning the chunk an ordering of ''self'' (as
opposed to ''parent''), we can store the fields in sorted order (rather than in ''parent''-sorted
order) and remove duplicates.
- 
- || ''name2'' ||
- || flavor ||
- || origin  ||
- 
- More importantly, a ''self''-ordered chunk should influence the order of tuples in child
chunks. When we encounter an ''self''-ordered chunk at level "name2", we should expect its
children in level "value" to be arranged as follows:
- 
- || ''value'' || ''parent_change'' ||
- || 3.4 || 1 ||
- || 5.6 || 1 ||
- || 2.6 || 1 ||
- || 4.2 || 1 ||
- || 4.9 || 1 ||
- || || 0 ||
- || france || 1 ||
- || || 0 ||
- || || 0 ||
- || china || 1 ||
- 
- The ''parent_change'' field is now a bitmap representing nulls: it indicates that all parents
have a 'flavor' tuple, but only the second and fifth parents have an 'origin' tuple. This
representation is ripe for compression.
- 
- === Summary ===
- 
- A (simplified) representation of the span so far (without metadata) is:
- 
- ''(parent-ordered)''
- || ''row key'' || ''parent_change'' ||
- || cheese  || 0 ||
- || fruit   || 0 ||
- ''(parent-ordered)''
- || ''name1''  || ''parent_change'' ||
- || brie || 0 ||
- || gouda || 0 ||
- || swiss || 0 ||
- || apple || 1 ||
- || pear  || 1 ||
- ''(self-ordered)''
- || ''name2'' ||
- || flavor ||
- || origin  ||
- ''(parent-ordered)''
- || ''value'' || ''parent_change'' ||
- || 3.4 || 1 ||
- || 5.6 || 1 ||
- || 2.6 || 1 ||
- || 4.2 || 1 ||
- || 4.9 || 1 ||
- || || 0 ||
- || france || 1 ||
- || || 0 ||
- || || 0 ||
- || china || 1 ||
  
  == Metadata ==
  

Mime
View raw message