hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Dimiduk <ndimi...@gmail.com>
Subject Re: Review request for HBASE-7692: Ordered byte[] serialization
Date Fri, 22 Feb 2013 18:48:05 GMT
On Fri, Feb 22, 2013 at 10:14 AM, Matt Corgan <mcorgan@hotpads.com> wrote:

> >
> > Not quite true. It makes use of Bytes and ImmutableBytesWritable from
> > hbase-common.
>
> Oh, interesting.  Could we inline the code from Bytes.java and somehow get
> rid of the ImmutableBytesWritable.  Like calling packages can add
> ImmutableBytesWritable functionality on top if they want to?


I'll need to do a more thorough evaluation, but a cursory glance indicates
use of Bytes could be replaced by arraycopy. ImmutableBytesWritable is used
mostly as a convenient wrapper over byte[], and may well
be replaceable with Hadoop's BytesWritable.

Seems like something as low level as rearranging bytes should be dependency
> free.
>

The implementation makes heavy use of Hadoop Writables, but the
dependencies on HBase instances are mostly convenience.

 On Fri, Feb 22, 2013 at 10:04 AM, Nick Dimiduk <ndimiduk@gmail.com> wrote:
>
> > Inline.
> >
> > On Fri, Feb 22, 2013 at 10:00 AM, Matt Corgan <mcorgan@hotpads.com>
> wrote:
> >
> > > To nitpick a little it wouldn't quite be a sibling of hbase-client
> > because
> > > hbase-client depends on hbase-common and hbase-protocol while this new
> > one
> > > will not depend on anything.  Would hbase-server be able to see it?
> >  Would
> > > it basically be a standalone module being maintained by HBase?
> > >
> >
> > Not quite true. It makes use of Bytes and ImmutableBytesWritable from
> > hbase-common.
> >
> > Also, assuming the original Orderly library goes unmaintained and we want
> > > people to use it, this will be the primary place to get it.  Having no
> > > dependencies on other hbase modules is important for people who want to
> > use
> > > the Orderly library for something unrelated to hbase.  For example, a
> web
> > > application that logs data in this format but not directly to hbase.
> > >
> >
> > Orderly has gone unmaintained. The only fork with any activity that I'm
> > aware of is my own. I'd much rather see it gain the publicity,
> > additional scrutiny, wider adoption than continue as a pet-project.
> >
> > On Fri, Feb 22, 2013 at 9:32 AM, Elliott Clark <eclark@apache.org>
> wrote:
> > >
> > > > Yep the client will be fully separated as soon as rpc changes
> > > > are stabilized.  Until then keeping up the move patch was just too
> > > onerous.
> > > >
> > > >
> > > > On Fri, Feb 22, 2013 at 6:31 AM, Jonathan Hsieh <jon@cloudera.com>
> > > wrote:
> > > >
> > > > > Nick,
> > > > >
> > > > > I'm +1 for it having its own module, and being a sibling of
> > > hbase-client.
> > > > >  I'm assuming the client stuff will happen before we release 0.96
> > since
> > > > it
> > > > > has been started.
> > > > >
> > > > > Jon.
> > > > >
> > > > > On Fri, Feb 22, 2013 at 6:13 AM, Nick Dimiduk <ndimiduk@gmail.com>
> > > > wrote:
> > > > >
> > > > > > You're absolutely correct: this library introduces client-side
> > > > > conventions
> > > > > > and is not needed from within the HMaster or RegionServer. Is
> > > > > > the consensus that it should reside in it's own module or be
a
> > > sibling
> > > > to
> > > > > > the o.a.h.hbase.client source tree? I'm a little confused by
the
> > > > current
> > > > > > state of the modules; hbase-client looks empty while
> > > o.a.h.hbase.client
> > > > > > sits under hbase-server.
> > > > > >
> > > > > > Thanks,
> > > > > > Nick
> > > > > >
> > > > > > On Thu, Feb 21, 2013 at 11:56 PM, Jonathan Hsieh <
> jon@cloudera.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > So I buy the argument about this being included in hbase,
but
> > > several
> > > > > of
> > > > > > > the questions still stand --
> > > > > > >
> > > > > > > Why is this part of hbase-common?  shouldn't this be just
a
> > > > dependency
> > > > > of
> > > > > > > hbase-client module?  Does the hbase-server side need to
depend
> > on
> > > > > this?
> > > > > > >
> > > > > > > Since this is a large import of a currently isolated library,
> why
> > > not
> > > > > > make
> > > > > > > it a separate module instead of part of hbase-common? 
This
> would
> > > > > > enforce a
> > > > > > > boundary that will prevent pollution from circular
> dependencies.
> > > > > > >
> > > > > > > Jon.
> > > > > > >
> > > > > > > On Thu, Feb 21, 2013 at 7:23 PM, Enis Söztutar <
> enis@apache.org>
> > > > > wrote:
> > > > > > >
> > > > > > > > I think this belongs in core HBase, as a replacement
to
> Bytes,
> > > > which
> > > > > > > should
> > > > > > > > be deprecated eventually. We have a Bytes utility
which is
> > > supposed
> > > > > to
> > > > > > > > convert basic java types to byte[]'s, but it does
not work
> for
> > > > signed
> > > > > > > > numbers.
> > > > > > > >
> > > > > > > > We already know that all of the clients, Hive, Pig,
Phoenix,
> > have
> > > > to
> > > > > > have
> > > > > > > > at least java type -> byte[] conversion utilities,
and I
> think
> > it
> > > > is
> > > > > > > > HBase's job to supply one so that different clients
can
> > > > interoperate.
> > > > > > > Since
> > > > > > > > internally we are also relying on serializing java
types, we
> > need
> > > > > that
> > > > > > > > library in the core.
> > > > > > > >
> > > > > > > > BTW, I also think that we need to have a SQL-type
to java
> type
> > to
> > > > > > byte[]
> > > > > > > > layer, but that is another discussion.
> > > > > > > >
> > > > > > > > Enis
> > > > > > > >
> > > > > > > >
> > > > > > > > On Thu, Feb 21, 2013 at 3:04 PM, Jonathan Hsieh <
> > > jon@cloudera.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Nick,
> > > > > > > > >
> > > > > > > > > While I believe having an order-preserving canonical
> > > > serialization
> > > > > > is a
> > > > > > > > > good idea,  from doing a read of the mail and
a skim of the
> > > jira
> > > > it
> > > > > > is
> > > > > > > > not
> > > > > > > > > clear to my why this is inside hbase as part
of
> hbase-common.
> > > > > > > > >
> > > > > > > > > Why isn't this part of a library on top of hbase
(a
> > dependency
> > > > for
> > > > > > > > > Pig/Hive) instead of "inside" hbase?
> > > > > > > > > Can't this functionality be done just from the
client
> level?
> > > > > > > > > What's the end goal hee? Is the goal here to
replace the
> > > > > > > Bytes.toBytes(*)
> > > > > > > > > methods to enforced the ordering?
> > > > > > > > > If I HBase has two mutually incompatible encodings
> > "built-in",
> > > > how
> > > > > > > does a
> > > > > > > > > dev know to use one or the other later on?
> > > > > > > > > If this is essentially a mega import of a library
(300k..
> > > yikes)
> > > > ,
> > > > > > why
> > > > > > > > not
> > > > > > > > > make it a separate module instead of part of
common?
> > > > > > > > >
> > > > > > > > > Jon.
> > > > > > > > >
> > > > > > > > > On Thu, Feb 21, 2013 at 10:35 AM, Nick Dimiduk
<
> > > > ndimiduk@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi everyone,
> > > > > > > > > >
> > > > > > > > > > I'm of the opinion that HBase should provide
a mechanism
> > for
> > > > > > > > serializing
> > > > > > > > > > common java types such that the serialized
format sorts
> > > > according
> > > > > > the
> > > > > > > > > > the natural ordering of the type. I think
many
> application
> > > > > efforts
> > > > > > > end
> > > > > > > > up
> > > > > > > > > > building a custom, partial implementation
of this kind of
> > > > > > > functionality
> > > > > > > > > on
> > > > > > > > > > their own. I think HBase should provide
a canonical
> > > > > implementation
> > > > > > of
> > > > > > > > > such
> > > > > > > > > > a serialization format so that third-parties
can reliably
> > > build
> > > > > on
> > > > > > > top
> > > > > > > > of
> > > > > > > > > > HBase. Not just user applications, but other
tools like
> Pig
> > > and
> > > > > > Hive
> > > > > > > > are
> > > > > > > > > > also enabled. Implementations for
> > > > > > > > > > HIVE-3634<
> https://issues.apache.org/jira/browse/HIVE-3634
> > >,
> > > > > > > > > > HIVE-2599 <
> https://issues.apache.org/jira/browse/HIVE-2599
> > >,
> > > > or
> > > > > > > > > > HIVE-2903<
> https://issues.apache.org/jira/browse/HIVE-2903
> > > > >could
> > > > > be
> > > > > > > > > > compatible with similar features in Pig.
> > > > > > > > > >
> > > > > > > > > > After implementing something similar on
multiple
> occasions,
> > > > > > stumbled
> > > > > > > > > across
> > > > > > > > > > the Orderly <https://github.com/ndimiduk/orderly>
> library.
> > > > It's
> > > > > > also
> > > > > > > > > > appears to have been adopted by other large
projects,
> > > including
> > > > > > > > > > Lily<https://github.com/NGDATA/orderly>.
> > > > > > > > > > I've engaged the library's author for some
improvements
> > only
> > > to
> > > > > > find
> > > > > > > > out
> > > > > > > > > > he's now at Google and will no longer be
maintaining it.
> > > Thus,
> > > > I
> > > > > > > > propose
> > > > > > > > > we
> > > > > > > > > > take it into HBase.
> > > > > > > > > >
> > > > > > > > > > HBASE-7692 <
> > https://issues.apache.org/jira/browse/HBASE-7692
> > > >
> > > > > > > > includes a
> > > > > > > > > > patch that introduces Orderly into hbase-common
under the
> > > > orderly
> > > > > > > > > > namespace. I have an associated branch on
> > > > > > > > > > gihub<
> > > > > > > > >
> > > > >
> https://github.com/ndimiduk/hbase/commits/7692-ordered-serialization
> > > > > > > > > > >wherein
> > > > > > > > > > I've broken the patch out into multiple
commits to ease
> > > review.
> > > > > > > > > > Please take a few minutes to give it a look.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Nick
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > // Jonathan Hsieh (shay)
> > > > > > > > > // Software Engineer, Cloudera
> > > > > > > > > // jon@cloudera.com
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > // Jonathan Hsieh (shay)
> > > > > > > // Software Engineer, Cloudera
> > > > > > > // jon@cloudera.com
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > // Jonathan Hsieh (shay)
> > > > > // Software Engineer, Cloudera
> > > > > // jon@cloudera.com
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message