hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Corgan <mcor...@hotpads.com>
Subject Re: Review request for HBASE-7692: Ordered byte[] serialization
Date Sat, 23 Feb 2013 00:48:58 GMT
I agree with Jonathan that ideally this would not depend on hbase or
hadoop.  Could we just replace Hadoop's BytesWritable with a new class that
does the same thing?

I also have a concern about the way it builds the multi-field byte[] by
allocating somewhat expensive Builder objects, etc.  It's suitable for
application level code, but most of the innards of hbase regionserver
should be using tighter code for best performance and less garbage.
 Perhaps in a future issue we can separate the builder wrappers from their
internal byte converters so that hbase-server can use the lower-level byte
converters without the builder overhead.


On Fri, Feb 22, 2013 at 4:33 PM, Jonathan Hsieh <jon@cloudera.com> wrote:

> I think I misspoke slightly but basically agree with Matt's notion that
> this would end up being the place to pickup the orderly jar and that
> ideally it has no hbase-* dependencies.
>
> I actually feel that the hbase-orderly module is a sibling to hbase-common
> and hbase-client. My initial thought is that this is ideally not depended
> upon by the hbase-client.  An app would use hbase-orderly and hbase-client.
>
>
>  A simplified module dependency graph (excluding some details) would be
> (where -> == "depends on")
>
> app -> hbase-client, hbase-orderly
> hbase-client -> hbase-protocol, hbase-common, *-compat
> hbase-common -> none of the hbase-*
> hbase-orderly -> none of the hbase-*
>
> I'm don't quite understand what the multiple patches are for the module
> work (or is this follow on stuff that uses this)?  can you explain what the
> breakdown would be?  since it isn't committed yet and should be self
> contained, just do the big import as a single patch?
>
> Thanks for bring this up for discussion Nick.
>
> Jon.
>
> On Fri, Feb 22, 2013 at 3:13 PM, Nick Dimiduk <ndimiduk@gmail.com> wrote:
>
> > On Fri, Feb 22, 2013 at 10:00 AM, Matt Corgan <mcorgan@hotpads.com>
> wrote:
> >
> > > To nitpick a little it wouldn't quite be a sibling of hbase-client
> > because
> > > hbase-client depends on hbase-common and hbase-protocol
> > >
> >
> > Actually, quite the contrary. I don't see this as being an external
> module
> > as much as integral to the client's use of HBase (read "client" as
> > "application consuming HBase", not "the HBase RPC client
> implementation").
> > Further, once HBase provides a suitable serialization format for
> > primitives, why not push them into the client API? IMHO, HBase really
> > should provide basic types for users at the Mutation layer. That,
> however,
> > belongs in an entirely separate ticket.
> >
> > On Fri, Feb 22, 2013 at 9:32 AM, Elliott Clark <eclark@apache.org>
> wrote:
> > >
> > > > Yep the client will be fully separated as soon as rpc changes
> > > > are stabilized.  Until then keeping up the move patch was just too
> > > onerous.
> > > >
> > > >
> > > > On Fri, Feb 22, 2013 at 6:31 AM, Jonathan Hsieh <jon@cloudera.com>
> > > wrote:
> > > >
> > > > > Nick,
> > > > >
> > > > > I'm +1 for it having its own module, and being a sibling of
> > > hbase-client.
> > > > >  I'm assuming the client stuff will happen before we release 0.96
> > since
> > > > it
> > > > > has been started.
> > > > >
> > > > > Jon.
> > > > >
> > > > > On Fri, Feb 22, 2013 at 6:13 AM, Nick Dimiduk <ndimiduk@gmail.com>
> > > > wrote:
> > > > >
> > > > > > You're absolutely correct: this library introduces client-side
> > > > > conventions
> > > > > > and is not needed from within the HMaster or RegionServer. Is
> > > > > > the consensus that it should reside in it's own module or be
a
> > > sibling
> > > > to
> > > > > > the o.a.h.hbase.client source tree? I'm a little confused by
the
> > > > current
> > > > > > state of the modules; hbase-client looks empty while
> > > o.a.h.hbase.client
> > > > > > sits under hbase-server.
> > > > > >
> > > > > > Thanks,
> > > > > > Nick
> > > > > >
> > > > > > On Thu, Feb 21, 2013 at 11:56 PM, Jonathan Hsieh <
> jon@cloudera.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > So I buy the argument about this being included in hbase,
but
> > > several
> > > > > of
> > > > > > > the questions still stand --
> > > > > > >
> > > > > > > Why is this part of hbase-common?  shouldn't this be just
a
> > > > dependency
> > > > > of
> > > > > > > hbase-client module?  Does the hbase-server side need to
depend
> > on
> > > > > this?
> > > > > > >
> > > > > > > Since this is a large import of a currently isolated library,
> why
> > > not
> > > > > > make
> > > > > > > it a separate module instead of part of hbase-common? 
This
> would
> > > > > > enforce a
> > > > > > > boundary that will prevent pollution from circular
> dependencies.
> > > > > > >
> > > > > > > Jon.
> > > > > > >
> > > > > > > On Thu, Feb 21, 2013 at 7:23 PM, Enis Söztutar <
> enis@apache.org>
> > > > > wrote:
> > > > > > >
> > > > > > > > I think this belongs in core HBase, as a replacement
to
> Bytes,
> > > > which
> > > > > > > should
> > > > > > > > be deprecated eventually. We have a Bytes utility
which is
> > > supposed
> > > > > to
> > > > > > > > convert basic java types to byte[]'s, but it does
not work
> for
> > > > signed
> > > > > > > > numbers.
> > > > > > > >
> > > > > > > > We already know that all of the clients, Hive, Pig,
Phoenix,
> > have
> > > > to
> > > > > > have
> > > > > > > > at least java type -> byte[] conversion utilities,
and I
> think
> > it
> > > > is
> > > > > > > > HBase's job to supply one so that different clients
can
> > > > interoperate.
> > > > > > > Since
> > > > > > > > internally we are also relying on serializing java
types, we
> > need
> > > > > that
> > > > > > > > library in the core.
> > > > > > > >
> > > > > > > > BTW, I also think that we need to have a SQL-type
to java
> type
> > to
> > > > > > byte[]
> > > > > > > > layer, but that is another discussion.
> > > > > > > >
> > > > > > > > Enis
> > > > > > > >
> > > > > > > >
> > > > > > > > On Thu, Feb 21, 2013 at 3:04 PM, Jonathan Hsieh <
> > > jon@cloudera.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Nick,
> > > > > > > > >
> > > > > > > > > While I believe having an order-preserving canonical
> > > > serialization
> > > > > > is a
> > > > > > > > > good idea,  from doing a read of the mail and
a skim of the
> > > jira
> > > > it
> > > > > > is
> > > > > > > > not
> > > > > > > > > clear to my why this is inside hbase as part
of
> hbase-common.
> > > > > > > > >
> > > > > > > > > Why isn't this part of a library on top of hbase
(a
> > dependency
> > > > for
> > > > > > > > > Pig/Hive) instead of "inside" hbase?
> > > > > > > > > Can't this functionality be done just from the
client
> level?
> > > > > > > > > What's the end goal hee? Is the goal here to
replace the
> > > > > > > Bytes.toBytes(*)
> > > > > > > > > methods to enforced the ordering?
> > > > > > > > > If I HBase has two mutually incompatible encodings
> > "built-in",
> > > > how
> > > > > > > does a
> > > > > > > > > dev know to use one or the other later on?
> > > > > > > > > If this is essentially a mega import of a library
(300k..
> > > yikes)
> > > > ,
> > > > > > why
> > > > > > > > not
> > > > > > > > > make it a separate module instead of part of
common?
> > > > > > > > >
> > > > > > > > > Jon.
> > > > > > > > >
> > > > > > > > > On Thu, Feb 21, 2013 at 10:35 AM, Nick Dimiduk
<
> > > > ndimiduk@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi everyone,
> > > > > > > > > >
> > > > > > > > > > I'm of the opinion that HBase should provide
a mechanism
> > for
> > > > > > > > serializing
> > > > > > > > > > common java types such that the serialized
format sorts
> > > > according
> > > > > > the
> > > > > > > > > > the natural ordering of the type. I think
many
> application
> > > > > efforts
> > > > > > > end
> > > > > > > > up
> > > > > > > > > > building a custom, partial implementation
of this kind of
> > > > > > > functionality
> > > > > > > > > on
> > > > > > > > > > their own. I think HBase should provide
a canonical
> > > > > implementation
> > > > > > of
> > > > > > > > > such
> > > > > > > > > > a serialization format so that third-parties
can reliably
> > > build
> > > > > on
> > > > > > > top
> > > > > > > > of
> > > > > > > > > > HBase. Not just user applications, but other
tools like
> Pig
> > > and
> > > > > > Hive
> > > > > > > > are
> > > > > > > > > > also enabled. Implementations for
> > > > > > > > > > HIVE-3634<
> https://issues.apache.org/jira/browse/HIVE-3634
> > >,
> > > > > > > > > > HIVE-2599 <
> https://issues.apache.org/jira/browse/HIVE-2599
> > >,
> > > > or
> > > > > > > > > > HIVE-2903<
> https://issues.apache.org/jira/browse/HIVE-2903
> > > > >could
> > > > > be
> > > > > > > > > > compatible with similar features in Pig.
> > > > > > > > > >
> > > > > > > > > > After implementing something similar on
multiple
> occasions,
> > > > > > stumbled
> > > > > > > > > across
> > > > > > > > > > the Orderly <https://github.com/ndimiduk/orderly>
> library.
> > > > It's
> > > > > > also
> > > > > > > > > > appears to have been adopted by other large
projects,
> > > including
> > > > > > > > > > Lily<https://github.com/NGDATA/orderly>.
> > > > > > > > > > I've engaged the library's author for some
improvements
> > only
> > > to
> > > > > > find
> > > > > > > > out
> > > > > > > > > > he's now at Google and will no longer be
maintaining it.
> > > Thus,
> > > > I
> > > > > > > > propose
> > > > > > > > > we
> > > > > > > > > > take it into HBase.
> > > > > > > > > >
> > > > > > > > > > HBASE-7692 <
> > https://issues.apache.org/jira/browse/HBASE-7692
> > > >
> > > > > > > > includes a
> > > > > > > > > > patch that introduces Orderly into hbase-common
under the
> > > > orderly
> > > > > > > > > > namespace. I have an associated branch on
> > > > > > > > > > gihub<
> > > > > > > > >
> > > > >
> https://github.com/ndimiduk/hbase/commits/7692-ordered-serialization
> > > > > > > > > > >wherein
> > > > > > > > > > I've broken the patch out into multiple
commits to ease
> > > review.
> > > > > > > > > > Please take a few minutes to give it a look.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Nick
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > // Jonathan Hsieh (shay)
> > > > > > > > > // Software Engineer, Cloudera
> > > > > > > > > // jon@cloudera.com
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > // Jonathan Hsieh (shay)
> > > > > > > // Software Engineer, Cloudera
> > > > > > > // jon@cloudera.com
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > // Jonathan Hsieh (shay)
> > > > > // Software Engineer, Cloudera
> > > > > // jon@cloudera.com
> > > > >
> > > >
> > >
> >
>
>
>
> --
> // Jonathan Hsieh (shay)
> // Software Engineer, Cloudera
> // jon@cloudera.com
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message