hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hive/IndexDev/Bitmap" by MarquisWang
Date Sat, 20 Nov 2010 02:43:58 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Hive/IndexDev/Bitmap" page has been changed by MarquisWang.
http://wiki.apache.org/hadoop/Hive/IndexDev/Bitmap?action=diff&rev1=1&rev2=2

--------------------------------------------------

  
  The basic implementation's only compression is eliminating blocks where all rows are 0s.
This is unlikely to happen for larger blocks, so we need a better compression format. What
we can do is do byte-aligned bitmap compression, where the bitmap is an array of bytes, and
a byte of all 1s or all 0s implies one or more bytes where every value is 0 or 1. Then, we
would just need to add another column in the bitmap index table that is an array of Ints that
describe how long the gaps are and logic to expand the compression.
  
+ == Example ==
+ 
+ Suppose we have a bitmap index on a key where, on the first block, value "a" appears in
rows 5, 12, and 64, and value "b" appears in rows 7, 8, and 9. Then, for the preliminary implementation,
the first entry in the index table will be:
+ 
+ {{https://issues.apache.org/jira/secure/attachment/12460083/bitmap_index_1.png}}
+ 
+ The values in the array represent the bitmap for each block, where each 32-bit BigInt value
stores 32 rows.
+ 
+ For the second iteration, the first entry will be:
+ 
+ {{https://issues.apache.org/jira/secure/attachment/12460083/bitmap_index_2.png}}
+ 
+ This one uses 1-byte array entries, so each value in the array stores 8 rows. If an entry
is 0x00 or 0xFF, it represents 1 or more consecutive bytes of zeros, (in this case 5 and 4,
respectively)
+ 

Mime
View raw message