hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
Date Tue, 10 Mar 2015 17:48:38 GMT

    [ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355297#comment-14355297

stack commented on HBASE-11425:

Thanks for the writeup. Makes it easier discussing this new dev.

"Typical used value for max heap size is 32-48 GB."

This ain't right, is it? Usually we have folks hover just below 32G so can do compressed pointers.

"Each bucket’s size is fixed to 4KB."

Should bucket size be same as the hfile block size?

Can MBB be developed in isolation with tests and refcounting tests apart from main code base?
Is that being done?

High-level, general question: So eviction was easy before. When memory pressure just evict
until needed memory is made available. The eviction is now made more complicated because have
to check for non-zero refcount? And what if can't find necessary memory? What happens?

"Note that the LRU Cache does not have this block reference counting happening as that does
not deal with BBs and deals with the HFileblock objects directly."

Why not? We copy from the LRU blocks to Cell arrays? Couldn't Cells go against the LRU blocks
directly too? Or I have it wrong?

I don't see a downside listing that we'll be doubling the objects made when offheap reading.
Is that right?

"Please note that the Cells in the memstore are still KV based (byte [] backed)" ... this
is because you are only doing read-path in this JIRA, right? Then again, reading, we have
to read from the MemStore so this means that read path can be a mix of onheap and offheap

On adding new methods to Cell, are there 'holes'? We talked about this in the past and it
seemed like there could be strange areas in the Cell API if you did certain calls. If you
don't know what I am on about, I'll dig up the old discussion (I think it was on mailing list...
Ram you asked for input).

... or maybe the holes have been plugged by 'Using getXXXArray() would throw UnSupportedOperationException.
'?  And....
"This will make so many short living objects creation also. That is why we decided to go with
usage of getXXXOffset() and getXXXLength() API usage also along with buffer based APIs"

So, you might want to underline this point. Its BB but WE are managing the position and length
to save on object creation and to bypass BB range checking, etc.

What does that mean for the 'client'?  When you give out a BB, its position, etc., is not
to be relied upon.  That will be disorientating.  Pity you couldn't throw unsupportedexception
if they tried use position, etc. So you need BB AND the Cell to get at content. BB for the
array and then Cell for the offset and length...

So, this API is for users on client-side? It is going to confuse them when they have a BB
but the position and limit are duds. In client, when would they be doing BB? Never? Client
won't be offheaping? If so, could the BB APIs be mixed in to Cell on the server only?

So, why have the switch at all? The hasArray switch? Why not BB it all the time? It would
simplify the read path.  Disadvantage would be it'd be extra objects?

When you say this: "Note that even if the HFileBlock is on heap BB we do not support getXXXArray()
APIs. " This is only if hasArray returns false, right?

Yeah, looks like 2.0.

Tell us more about the unsafe manipulation of BBs? How's that work?

Nice writeup.

> Cell/DBB end-to-end on the read-path
> ------------------------------------
>                 Key: HBASE-11425
>                 URL: https://issues.apache.org/jira/browse/HBASE-11425
>             Project: HBase
>          Issue Type: Umbrella
>          Components: regionserver, Scanners
>    Affects Versions: 0.99.0
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>         Attachments: Offheap reads in HBase using BBs_final.pdf
> Umbrella jira to make sure we can have blocks cached in offheap backed cache. In the
entire read path, we can refer to this offheap buffer and avoid onheap copying.
> The high level items I can identify as of now are
> 1. Avoid the array() call on BB in read path.. (This is there in many classes. We can
handle class by class)
> 2. Support Buffer based getter APIs in cell.  In read path we will create a new Cell
with backed by BB. Will need in CellComparator, Filter (like SCVF), CPs etc.
> 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy.
> 4. Remove all CP hooks (which are already deprecated) which deal with KVs.  (In read
> Will add subtasks under this.

This message was sent by Atlassian JIRA

View raw message