hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anoop Sam John (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-15554) StoreFile$Writer.appendGeneralBloomFilter generates extra KV
Date Fri, 05 Aug 2016 11:34:20 GMT

    [ https://issues.apache.org/jira/browse/HBASE-15554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15409344#comment-15409344
] 

Anoop Sam John commented on HBASE-15554:
----------------------------------------

Based on the discussion around functions to be added to Hash
We had  hash(byte[]) and the patch adds 
 - hash(BB) to handle Row bloom for Off heap cells
 - hash(Cell) to handle ROW_COL bloom type. To avoid need to copy row and qual to recreate
a Cell with null CF.

So the Q is adding these variants add lots of duplicate code and it looks bit ugly.  So can
we have only one function hash(Cell) - This solves duplicate code paths but it will be very
ugly to pass Cell and a bloom type to hash functions. The hash function knowing abt the Bloom
type based usage of Cell components!

Thinking on this here is one idea am proposing.
Have one function only in Hash .    hash(HashKey)
Let the HashKey be some thing like Iterable. Using that we can iterate over the bytes corresponding
to the HashKey.  Let there be 2 impls of this HashKey one for ROW and another for ROW_COL
type.  Each of the impl will keep ref to the Cells. The iterator impl knows which all bytes
of Cells to be considered. ROW type will take row bytes only (either from byte[] or BB) ROW_COL
type takes row bytes first followed by qual bytes.    This alone is a considerable amount
of change and worth doing as sub task.  If u want I can do as a PoC first.  I have not checked
wrt code wise at all.  But looks we can make it in a clean way.

> StoreFile$Writer.appendGeneralBloomFilter generates extra KV
> ------------------------------------------------------------
>
>                 Key: HBASE-15554
>                 URL: https://issues.apache.org/jira/browse/HBASE-15554
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Performance
>            Reporter: Vladimir Rodionov
>            Assignee: ramkrishna.s.vasudevan
>             Fix For: 2.0.0
>
>         Attachments: HBASE-15554.patch, HBASE-15554_3.patch, HBASE-15554_4.patch, HBASE-15554_6.patch,
HBASE-15554_7.patch
>
>
> Accounts for 10% memory allocation in compaction thread when BloomFilterType is ROWCOL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message