hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hive/IndexDev/Bitmap" by MarquisWang
Date Sun, 30 Jan 2011 23:40:16 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Hive/IndexDev/Bitmap" page has been changed by MarquisWang.
http://wiki.apache.org/hadoop/Hive/IndexDev/Bitmap?action=diff&rev1=3&rev2=4

--------------------------------------------------

  
  This implementation confers some of the benefits of bitmap indexing and should be easy to
implement given the already existing compact index, but it does few of the optimizations such
as compression that a really good bitmap index should do.
  
- Like the complex index, this implementation uses an index table. The index table on a column
"key" has three columns, _bucketname, _offset, and _bitmaps. _bucketname is a string pointing
to the hadoop file that is storing this block in the table, _offset is the block offset of
a block, and _bitmaps is a Map where the keys are all the values of the column "key" that
exist in this block and a bitmap encoding (an Array of BigInts??) of every row in that block,
with a 1 if that row has the value, 0 if not. If a key value does not appear in a block at
all, the value is not stored in the map.
+ Like the complex index, this implementation uses an index table. The index table on a column
"key" has four or more columns: first, the columns that are being indexed, then _bucketname,
_offset, and _bitmaps. _bucketname is a string pointing to the hadoop file that is storing
this block in the table, _offset is the block offset of a block, and _bitmaps is an uncompressed
bitmap encoding (an Array of bytes) of the bitmap for this column value, bucketname, and row
offset. Each bit in the bitmap corresponds to one row in the block. The bit is 1 if that row
has the value of the values in the columns being indexed, and a 0 if not. If a key value does
not appear in a block at all, the value is not stored in the map.
  
- When querying this index, we select each filename, block pair where the _bitmaps Map has
a key that is the queried key value. If there are boolean AND or OR operations done on the
predicates with bitmap indexes, we can use bitwise operations to try to eliminate blocks as
well. We can use this data to generate the filename, array of block offsets format that the
compact index handler uses and reuse that in the bitmap index query.
+ When querying this index, if there are boolean AND or OR operations done on the predicates
with bitmap indexes, we can use bitwise operations to try to eliminate blocks as well. We
can then eliminate blocks that do not contain the value combinations we are interested in.
We can use this data to generate the filename, array of block offsets format that the compact
index handler uses and reuse that in the bitmap index query.
  
  === Second iteration ===
  

Mime
View raw message