hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Hsieh <...@cloudera.com>
Subject Re: ByteBuffer Backed Cell - New APIs (HBASE-12358)
Date Fri, 05 Dec 2014 15:04:42 GMT
I think of it from a compatibility point of view - each method has its own
semantics and the semantics should not morph between versions.

With the repurposing the old apis approach, someone who ports to the new
version without porting to the api will likely run into some nasty
surprises.  Worse yet, they'd need to modify their code and understand the
details of what the semantics changes are.

Having completely new apis makes it easier to see when deprecated
behavior/methods are removed and doesn't require code inspection to verify.
(can rely on compiler warnings)  Also, old code won't be broken (just less
efficient).

Jon.



On Thu, Dec 4, 2014 at 10:41 PM, ramkrishna vasudevan <
ramkrishna.s.vasudevan@gmail.com> wrote:

> We thought about this suggestion too but this comes with a lot of more
> confusion (?) - what do you think?
> Adding new APIs but similar in behaviour could lead to lot more confusion?
> Internally it is more easy.
>
>
> On Fri, Dec 5, 2014 at 11:10 AM, Jonathan Hsieh <jon@cloudera.com> wrote:
>
> > Ram,
> >
> > Can we essentially do both by creating the new getXxBuffer, and then also
> > creating new offset and len apis -- getXxx(Bb|Buffer|Buf|B)Offset and
> > getXxx(Bb|Buffer|Buf|B)Length?
> >
> > The old getXxxArray could use the old getXxxOffset and getXxxLength
> calls.
> > Also we'd deprecate all of these and provide an sliced and diced version
> > that would have offset always 0.
> >
> > This way we wouldn't conflate the Bb and byte[] offsets and lengths.
> Also
> > we could behind the scenes convert Bbs to byte[] arrays and convert
> > byte[]'s in to Bbs while maintaining the same interface.
> >
> > Jon.
> >
> >
> >
> > On Thu, Dec 4, 2014 at 7:24 PM, ramkrishna vasudevan <
> > ramkrishna.s.vasudevan@gmail.com> wrote:
> >
> > > Hi Devs
> > >
> > > This write up is to provide a brief idea  on why we need a BB backed
> cell
> > > and what are the items that we need to take care before introducing new
> > > APIs in Cell that are BB backed.
> > >
> > > Pls refer to https://issues.apache.org/jira/browse/HBASE-12358 also
> and
> > > its
> > > parent JIRA https://issues.apache.org/jira/browse/HBASE-11425 for the
> > > history.
> > >
> > > Coming back to the discussion on new APIs, this discussion is based on
> > > supporting BB in the read path (write path is not targeted now) so that
> > we
> > > could work with offheap BBs also. This would avoid copying of data from
> > > BlockCache to the read path ByteBuffer.
> > >
> > > Assume we will be working with BBs in the read path, We will need to
> > >  introduce *getXXXBuffer() *APIs and also *hasArray()* in Cell itself
> > > directly.
> > > If we try to extend the cell or create a new Cell then *everywhere we
> > need
> > > to do instanceOf check or do type conversion *and that is why adding
> new
> > > APIs to Cell interface itself makes sense.
> > >
> > > Plan is to use this *getXXXBuffer()* API through out the read path
> > *instead
> > > of getXXXArray()*.
> > >
> > > Now there are two ways to use it
> > >
> > > 1) Use getXXXBuffer() along with getXXXOffset(), getXXXLength() like
> how
> > we
> > > use now for getXXXArray() APIs with the offset and length. Doing so
> would
> > > ensure that every where in the filters and CP one has to just replace
> the
> > > getXXXArray() with getXXXBuffer() and continue to use getXXXOffset()
> and
> > > getXXXLength(). We would do some wrapping of the byte[] with a BB
> incase
> > of
> > > KeyValue type of cells so that getXXXBuffer along with offset and
> length
> > > holds true everywhere. Note that here if hasArray is true(for KV case)
> > then
> > > getXXXArray() would also work.
> > >
> > > 2)The other way of using this is that use only getXXXBuffer() API and
> > > ensure that the BB is always duplicated/sliced and only the portion of
> > the
> > > total BB is returned which represents the individual component of the
> > Cell.
> > > In this case there is no use of getXXXOffset() (as it is going to be 0)
> > and
> > > getXXXLength() is any way going to be the sliced BB's limit.
> > >
> > > But in the 2nd approach we may end up in creating lot of small objects
> > even
> > > while doing comparison.
> > >
> > > Now the next problem that comes is what to do with the getXXXArray()
> > APIs.
> > > If one sees hasArray() as false (a DBB backed Cell) and uses the
> > > getXXXArray() API along with offset and length - what should we do.
> > Should
> > > we create a byte[] from the DBB and return it? Then in that case what
> > would
> > > should the *getXXXOffset() return for a getXXXBuffer or getXXXArray()?*
> > >
> > > If we go with the 2nd approach then getXXXBuffer() should be clearly
> > > documented saying that it has to be used without getXXXOffset() and
> > > getXXXLength() and use getXXXOffset() and getXXXLength() only with
> > > getXXXArray().
> > >
> > > Now if a Cell is backed by on heap BB then we could definitely return
> > > getXXXArray() also - but what to return in the getXXXOffset() would be
> > > determined by what approach to use for getXXXBuffer(). (based on (1)
> and
> > > (2)).
> > >
> > > We wanted to open up this topic now so that to get some feedback on
> what
> > > could be an option here. Since it is an user facing Interface we need
> to
> > be
> > > careful with this.
> > >
> > > I would suggest that whenever a Cell is *BB backed*(Onheap or offheap)
> > > always *hasArray() would be false* in that Cell impl.
> > >
> > > Every where we would use getXXXBuffer() along with getXXXOffest() and
> > > getXXXLength(). Even in case of KV we could wrap the byte[] with BB so
> > that
> > > we have uniformity through the read code and we don't have too many
> 'if'
> > > else conditions.
> > >
> > > When ever *hasArray() is false* - using getXXXArray() API would throw
> > > *UnSupportedOperation
> > > Exception*.
> > >
> > > As said if we want *getXXXArray()* to be supported as per the existing
> > way
> > > then getXXXBuffer() and getXXXOffset(), getXXXLength() should be
> clearly
> > > documented.
> > >
> > > Thoughts!!!
> > >
> > > Regards
> > > Ram & Anoop
> > >
> >
> >
> >
> > --
> > // Jonathan Hsieh (shay)
> > // HBase Tech Lead, Software Engineer, Cloudera
> > // jon@cloudera.com // @jmhsieh
> >
>



-- 
// Jonathan Hsieh (shay)
// HBase Tech Lead, Software Engineer, Cloudera
// jon@cloudera.com // @jmhsieh

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message