hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chia-Ping Tsai"<chia7...@apache.org>
Subject Re: [DISCUSS] Move Type out of KeyValue
Date Sun, 01 Oct 2017 04:28:23 GMT
The "custom cell type" never exists in the story. (Sorry for misleading you) 

Here is the story. i add some custom cells (for saving memory) to Put via Put#add(Cell). The
pseudocode of custom cell is shown below.

{code}
class MyObject() {
  Cell toCell() {
      return CellBuilderFactory.newBuilfer(SHALLOW_COPY)
                    .setRow(sharedBuffer, myRowOffset, myRowLength).
                    .setType(KeyValue.Type.Put.getCode()) // We call the IA.Private to get
valid code of Put
                    // set other fields
                    .build();
  }
}

put.add(myObject.toCell);
{code}

And then, I noticed the Put#add is not optimized for our heavy table(a chunk of cells in single
row), so I also extend the Put to add some #add methods for avoiding resizing collection.

That was the story -- I try to reducer the cost of converting our object to Put/Cell. A another
story i had mentioned is to build custom write path via Endpoint, but it is unrelated to this
topic. 

All class we use are shown below:
1) Cell -> IA.Public
2) CellBuilder -> IA.Public
3) CellBuilderFactory -> IA.Public
4) Put -> IA.Public
5) Put#add(Cell) -> IA.Public
5) KeyValue#Type -> IA.Private

That is why i want to make KeyValue#Type IA.Public.

--
Chia-Ping

On 2017-10-01 00:34, Andrew Purtell <andrew.purtell@gmail.com> wrote: 
> Thanks for sharing these details. They are intriguing. If possible could you explain
why the custom type is needed? 
> 
> Something has to be deployed on the server or the custom cell type isn’t guaranteed
to be handled correctly. It may work now by accident. I’m a little surprised a custom
cell type doesn’t cause an abort. Did you patch the code to handle it?
> 
> 
> > On Sep 30, 2017, at 1:06 AM, Chia-Ping Tsai <chia7712@apache.org> wrote:
> > 
> > Thanks for the nice suggestions. Andrew. Sorry for delay response. Busy today.
> > 
> > The root reason we must build own Cell on client side is that the data are located
on shared memory which is similar with MSLAB.
> > 
> > You are right. We can use attribute to carry our data but the byte[] is not acceptable
because we can’t assign the offset and length. In fact, the endpoint is a better way
for our case because our object can be  directly converted to PB object. Also it is easy to
apply shared memory to manage our object. However, it will be easier and more readable to
follow regular Put operation. All we have to do is to build own cell and extended Put. Nothing
have to be deployed on server.
> > 
> > I agree the custom cell is low level thing, and it should be used by advanced users.
What I concern is the classes related to  custom Cell have different IA declaration. I’am
fine to make them IA.Private but building the custom cell may be a common case.
> > 
> > — 
> > Chia-Ping
> > 
> >> On 2017-09-30 06:05, Andrew Purtell <apurtell@apache.org> wrote: 
> >> ​Construct a normal put or delete or batch mutation, add whatever extra
> >> state you need in one or more operation attributes, and use a
> >> regionobserver to extend normal processing to handle the extra state. I'm
> >> curious what dispatching to extension code because of a custom cell type
> >> buys you over dispatching to extension code because of the presence of an
> >> attribute (or cell tag). For example, in security coprocessors we take
> >> attribute data and attach it to the cell using cell tags. Later we check
> >> for cell tag(s) to determine if we have to take special action when the
> >> cell is accessed by a scanner, or during some operations (e.g. appends or
> >> increments have to do extra handling for cell security tags).
> >> 
> >> 
> >> On Fri, Sep 29, 2017 at 2:43 PM, Chia-Ping Tsai <chia7712@apache.org>
wrote:
> >> 
> >>>> Instead of a custom cell, could you use a regular cell with a custom
> >>>> operation attribute (see OperationWithAttributes).
> >>> Pardon me, I didn't get what you said.
> >>> 
> >>> 
> >>> 
> >>>> On 2017-09-30 04:31, Andrew Purtell <apurtell@apache.org> wrote:
> >>>> Instead of a custom cell, could you use a regular cell with a custom
> >>>> operation attribute (see OperationWithAttributes).
> >>>> 
> >>>> On Fri, Sep 29, 2017 at 1:28 PM, Chia-Ping Tsai <chia7712@apache.org>
> >>> wrote:
> >>>> 
> >>>>> The custom cell help us to save memory consumption. We don't have
own
> >>>>> serialization/deserialization mechanism, hence to transform data
from
> >>>>> client to server needs many conversion phase (user data -> Put/Cell
->
> >>> pb
> >>>>> object). The cost of conversion is large in transferring bulk data.
In
> >>>>> fact, we also have custom mutation to manage the memory usage of
inner
> >>> cell
> >>>>> collection.
> >>>>> 
> >>>>>> On 2017-09-30 02:43, Andrew Purtell <apurtell@apache.org>
wrote:
> >>>>>> What are the use cases for a custom cell? It seems a dangerously
low
> >>>>> level
> >>>>>> thing to attempt and perhaps we should unwind support for it.
But
> >>> perhaps
> >>>>>> there is a compelling justification.
> >>>>>> 
> >>>>>> 
> >>>>>> On Thu, Sep 28, 2017 at 10:20 PM, Chia-Ping Tsai <
> >>> chia7712@apache.org>
> >>>>>> wrote:
> >>>>>> 
> >>>>>>> Thanks for all comment.
> >>>>>>> 
> >>>>>>> The problem i want to resolve is the valid code should be
exposed
> >>> as
> >>>>>>> IA.Public. Otherwise, end user have to access the IA.Private
class
> >>> to
> >>>>> build
> >>>>>>> the custom cell.
> >>>>>>> 
> >>>>>>> For example, I have a use case which plays a streaming role
in our
> >>>>>>> appliaction. It
> >>>>>>> applies the CellBuilder(HBASE-18519) to build custom cells.
These
> >>> cells
> >>>>>>> have many same fields so they are put in shared-memory for
> >>> avoiding GC
> >>>>>>> pause. Everything is wonderful. However, we have to access
the
> >>>>> IA.Private
> >>>>>>> class - KeyValue#Type - to get the valid code of Put.
> >>>>>>> 
> >>>>>>> I believe there are many use cases of custom cell, and
> >>> consequently it
> >>>>> is
> >>>>>>> worth adding a way to get the valid type via IA.Public class.
> >>>>> Otherwise, it
> >>>>>>> may imply that the custom cell is based on a unstable way,
because
> >>> the
> >>>>>>> related code can be changed at any time.
> >>>>>>> --
> >>>>>>> Chia-Ping
> >>>>>>> 
> >>>>>>>> On 2017-09-29 00:49, Andrew Purtell <apurtell@apache.org>
wrote:
> >>>>>>>> I agree with Stack. Was typing up a reply to Anoop but
let me
> >>> move it
> >>>>>>> down
> >>>>>>>> here.
> >>>>>>>> 
> >>>>>>>> The type code exposes some low level details of how
our current
> >>>>> stores
> >>>>>>> are
> >>>>>>>> architected. But what if in the future you could swap
out HStore
> >>>>>>> implements
> >>>>>>>> Store with PStore implements Store, where HStore is
backed by
> >>> HFiles
> >>>>> and
> >>>>>>>> PStore is backed by Parquet? Just as a hypothetical
example. I
> >>> know
> >>>>> there
> >>>>>>>> would be larger issues if this were actually attempted.
Bear with
> >>>>> me. You
> >>>>>>>> can imagine some different new Store implementation
that has some
> >>>>>>>> advantages but is not a design derived from the log
structured
> >>> merge
> >>>>> tree
> >>>>>>>> if you like. Most values from a new Cell.Type based
on
> >>> KeyValue.Type
> >>>>>>>> wouldn't apply to cells from such a thing because they
are
> >>>>> particular to
> >>>>>>>> how LSMs work. I'm sure such a project if attempted
would make a
> >>>>> number
> >>>>>>> of
> >>>>>>>> changes requiring a major version increment and low
level details
> >>>>> could
> >>>>>>> be
> >>>>>>>> unwound from Cell then, but if we could avoid doing
it in the
> >>> first
> >>>>>>> place,
> >>>>>>>> I think it would better for maintainability.
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>>> On Thu, Sep 28, 2017 at 9:39 AM, Stack <stack@duboce.net>
wrote:
> >>>>>>>>> 
> >>>>>>>>> On Thu, Sep 28, 2017 at 2:25 AM, Chia-Ping Tsai
<
> >>>>> chia7712@apache.org>
> >>>>>>>>> wrote:
> >>>>>>>>> 
> >>>>>>>>>> hi folks,
> >>>>>>>>>> 
> >>>>>>>>>> User is allowed to create custom cell but the
valid code of
> >>> type
> >>>>> -
> >>>>>>>>>> KeyValue#Type - is declared as IA.Private. As
i see it, we
> >>> should
> >>>>>>> expose
> >>>>>>>>>> KeyValue#Type as Public Client. Three possible
ways are shown
> >>>>> below:
> >>>>>>>>>> 1) Change declaration of KeyValue#Type from
IA.Private to
> >>>>> IA.Public
> >>>>>>>>>> 2) Move KeyValue#Type into Cell.
> >>>>>>>>>> 3) Move KeyValue#Type to upper level
> >>>>>>>>>> 
> >>>>>>>>>> Any suggestions?
> >>>>>>>>>> 
> >>>>>>>>>> 
> >>>>>>>>> What is the problem that we are trying to solve
Chia-Ping? You
> >>>>> want to
> >>>>>>> make
> >>>>>>>>> Cells of a new Type?
> >>>>>>>>> 
> >>>>>>>>> My first reaction is that KV#Type is particular
to the KV
> >>>>>>> implementation.
> >>>>>>>>> Any new Cell implementation should not have to adopt
the
> >>> KeyValue
> >>>>>>> typing
> >>>>>>>>> mechanism.
> >>>>>>>>> 
> >>>>>>>>> S
> >>>>>>>>> 
> >>>>>>>>> 
> >>>>>>>>> 
> >>>>>>>>> 
> >>>>>>>>>> --
> >>>>>>>>>> Chia-Ping
> >>>>>>>>>> 
> >>>>>>>>>> 
> >>>>>>>>> 
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> --
> >>>>>>>> Best regards,
> >>>>>>>> Andrew
> >>>>>>>> 
> >>>>>>>> Words like orphans lost among the crosstalk, meaning
torn from
> >>>>> truth's
> >>>>>>>> decrepit hands
> >>>>>>>>   - A23, Crosstalk
> >>>>>>>> 
> >>>>>>> 
> >>>>>> 
> >>>>>> 
> >>>>>> 
> >>>>>> --
> >>>>>> Best regards,
> >>>>>> Andrew
> >>>>>> 
> >>>>>> Words like orphans lost among the crosstalk, meaning torn from
> >>> truth's
> >>>>>> decrepit hands
> >>>>>>   - A23, Crosstalk
> >>>>>> 
> >>>>> 
> >>>> 
> >>>> 
> >>>> 
> >>>> --
> >>>> Best regards,
> >>>> Andrew
> >>>> 
> >>>> Words like orphans lost among the crosstalk, meaning torn from truth's
> >>>> decrepit hands
> >>>>   - A23, Crosstalk
> >>>> 
> >>> 
> >> 
> >> 
> >> 
> >> -- 
> >> Best regards,
> >> Andrew
> >> 
> >> Words like orphans lost among the crosstalk, meaning torn from truth's
> >> decrepit hands
> >>   - A23, Crosstalk
> >> 
> 

Mime
View raw message