hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: ByteBuffer Backed Cell - New APIs (HBASE-12358)
Date Fri, 05 Dec 2014 03:54:35 GMT
Thanks for the writeup, Ram.

This feature is targeting 2.0 release, right ?

bq. If one sees hasArray() as false (a DBB backed Cell) and uses the
API along with offset and length

Is there example of the above usage pattern ? Within HBase core, we can
make sure the above pattern doesn't exist, right ?


On Thu, Dec 4, 2014 at 7:24 PM, ramkrishna vasudevan <
ramkrishna.s.vasudevan@gmail.com> wrote:

> Hi Devs
> This write up is to provide a brief idea  on why we need a BB backed cell
> and what are the items that we need to take care before introducing new
> APIs in Cell that are BB backed.
> Pls refer to https://issues.apache.org/jira/browse/HBASE-12358 also and
> its
> parent JIRA https://issues.apache.org/jira/browse/HBASE-11425 for the
> history.
> Coming back to the discussion on new APIs, this discussion is based on
> supporting BB in the read path (write path is not targeted now) so that we
> could work with offheap BBs also. This would avoid copying of data from
> BlockCache to the read path ByteBuffer.
> Assume we will be working with BBs in the read path, We will need to
>  introduce *getXXXBuffer() *APIs and also *hasArray()* in Cell itself
> directly.
> If we try to extend the cell or create a new Cell then *everywhere we need
> to do instanceOf check or do type conversion *and that is why adding new
> APIs to Cell interface itself makes sense.
> Plan is to use this *getXXXBuffer()* API through out the read path *instead
> of getXXXArray()*.
> Now there are two ways to use it
> 1) Use getXXXBuffer() along with getXXXOffset(), getXXXLength() like how we
> use now for getXXXArray() APIs with the offset and length. Doing so would
> ensure that every where in the filters and CP one has to just replace the
> getXXXArray() with getXXXBuffer() and continue to use getXXXOffset() and
> getXXXLength(). We would do some wrapping of the byte[] with a BB incase of
> KeyValue type of cells so that getXXXBuffer along with offset and length
> holds true everywhere. Note that here if hasArray is true(for KV case) then
> getXXXArray() would also work.
> 2)The other way of using this is that use only getXXXBuffer() API and
> ensure that the BB is always duplicated/sliced and only the portion of the
> total BB is returned which represents the individual component of the Cell.
> In this case there is no use of getXXXOffset() (as it is going to be 0) and
> getXXXLength() is any way going to be the sliced BB's limit.
> But in the 2nd approach we may end up in creating lot of small objects even
> while doing comparison.
> Now the next problem that comes is what to do with the getXXXArray() APIs.
> If one sees hasArray() as false (a DBB backed Cell) and uses the
> getXXXArray() API along with offset and length - what should we do. Should
> we create a byte[] from the DBB and return it? Then in that case what would
> should the *getXXXOffset() return for a getXXXBuffer or getXXXArray()?*
> If we go with the 2nd approach then getXXXBuffer() should be clearly
> documented saying that it has to be used without getXXXOffset() and
> getXXXLength() and use getXXXOffset() and getXXXLength() only with
> getXXXArray().
> Now if a Cell is backed by on heap BB then we could definitely return
> getXXXArray() also - but what to return in the getXXXOffset() would be
> determined by what approach to use for getXXXBuffer(). (based on (1) and
> (2)).
> We wanted to open up this topic now so that to get some feedback on what
> could be an option here. Since it is an user facing Interface we need to be
> careful with this.
> I would suggest that whenever a Cell is *BB backed*(Onheap or offheap)
> always *hasArray() would be false* in that Cell impl.
> Every where we would use getXXXBuffer() along with getXXXOffest() and
> getXXXLength(). Even in case of KV we could wrap the byte[] with BB so that
> we have uniformity through the read code and we don't have too many 'if'
> else conditions.
> When ever *hasArray() is false* - using getXXXArray() API would throw
> *UnSupportedOperation
> Exception*.
> As said if we want *getXXXArray()* to be supported as per the existing way
> then getXXXBuffer() and getXXXOffset(), getXXXLength() should be clearly
> documented.
> Thoughts!!!
> Regards
> Ram & Anoop

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message