hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anoop Sam John (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-15554) StoreFile$Writer.appendGeneralBloomFilter generates extra KV
Date Mon, 08 Aug 2016 18:36:20 GMT

    [ https://issues.apache.org/jira/browse/HBASE-15554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15412234#comment-15412234
] 

Anoop Sam John commented on HBASE-15554:
----------------------------------------

Am sorry if I was not saying it clear. I dont mean still patch is having duplicate. What I
mean is when I say Iterator based HashKey, I wanted it to be single structure we use with
Hash rather than byte[]/BB/Cell..  But if the algo demands an offset based byte getter am
fine.
bq.one is that what ever be the cell format we should finally assume the back end is KV format
key only. Because the offset and length that we pass to the hash algo is assuming that it
is continuous
Why we need pass an offset to hash() function?  We need pass HashKey. Internally the impl
of HashKey has to know which byte to be returned when getters are called on it. Ya if u dont
have iterator model u will have get(int) which return byte.  So the Hash functions has to
call get() based on relative offset eg: get(0), get(1) etc.  Not like cur way of offset+1,
offset+2.   When the impl gets these calls, it has to convert it into absolute offsets.  It
is not that simple in ROW_COL case.  Here based on the coming offset you have it map it which
area of the Cell this belongs also. That is what I was trying to say.  When get(0) or get(1)
is called, those comes in rkLen part.  get(2) -  get(<rkLen>+2)   these belong to rk
bytes.    So will have to deal some sort of math.  So you really dont have to assume that
the Cell is of KV serialization.   Just like in the past which all bytes of Cell where , continue
to use those.  Am I making it clear now?   It would be good if we can remove any sort of KV
assumption from the code path.  I think it is pending only in this Bloom area.

> StoreFile$Writer.appendGeneralBloomFilter generates extra KV
> ------------------------------------------------------------
>
>                 Key: HBASE-15554
>                 URL: https://issues.apache.org/jira/browse/HBASE-15554
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Performance
>            Reporter: Vladimir Rodionov
>            Assignee: ramkrishna.s.vasudevan
>             Fix For: 2.0.0
>
>         Attachments: HBASE-15554.patch, HBASE-15554_10.patch, HBASE-15554_3.patch, HBASE-15554_4.patch,
HBASE-15554_6.patch, HBASE-15554_7.patch, HBASE-15554_9.patch
>
>
> Accounts for 10% memory allocation in compaction thread when BloomFilterType is ROWCOL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message