hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Review request for HBASE-7692: Ordered byte[] serialization
Date Fri, 22 Feb 2013 21:14:56 GMT
Thanks Nick for carrying this through.

My pledge to reviewers: if you disagree with putting orderly in its own
module, please express your idea now.

On Fri, Feb 22, 2013 at 11:37 AM, Nick Dimiduk <ndimiduk@gmail.com> wrote:

> I'm working through the code that will produce a patch placing orderly in
> its own module. A question to reviewers: would you prefer I create separate
> JIRA/tasks for each of the individual patches? Will that be easier to
> review than dumping my squashed patch onto this ticket and asking you to
> look at github? Having this broken out into multiple tickets, I would feel
> better about using review board to aggregate comments.
>
> Please advise.
> Nick
>
> On Fri, Feb 22, 2013 at 10:48 AM, Nick Dimiduk <ndimiduk@gmail.com> wrote:
>
> > On Fri, Feb 22, 2013 at 10:14 AM, Matt Corgan <mcorgan@hotpads.com>
> wrote:
> >
> >> >
> >> > Not quite true. It makes use of Bytes and ImmutableBytesWritable from
> >> > hbase-common.
> >>
> >> Oh, interesting.  Could we inline the code from Bytes.java and somehow
> get
> >> rid of the ImmutableBytesWritable.  Like calling packages can add
> >> ImmutableBytesWritable functionality on top if they want to?
> >
> >
> > I'll need to do a more thorough evaluation, but a cursory glance
> indicates
> > use of Bytes could be replaced by arraycopy. ImmutableBytesWritable is
> used
> > mostly as a convenient wrapper over byte[], and may well
> > be replaceable with Hadoop's BytesWritable.
> >
> > Seems like something as low level as rearranging bytes should be
> >> dependency free.
> >>
> >
> > The implementation makes heavy use of Hadoop Writables, but the
> > dependencies on HBase instances are mostly convenience.
> >
> >  On Fri, Feb 22, 2013 at 10:04 AM, Nick Dimiduk <ndimiduk@gmail.com>
> >> wrote:
> >>
> >> > Inline.
> >> >
> >> > On Fri, Feb 22, 2013 at 10:00 AM, Matt Corgan <mcorgan@hotpads.com>
> >> wrote:
> >> >
> >> > > To nitpick a little it wouldn't quite be a sibling of hbase-client
> >> > because
> >> > > hbase-client depends on hbase-common and hbase-protocol while this
> new
> >> > one
> >> > > will not depend on anything.  Would hbase-server be able to see it?
> >> >  Would
> >> > > it basically be a standalone module being maintained by HBase?
> >> > >
> >> >
> >> > Not quite true. It makes use of Bytes and ImmutableBytesWritable from
> >> > hbase-common.
> >> >
> >> > Also, assuming the original Orderly library goes unmaintained and we
> >> want
> >> > > people to use it, this will be the primary place to get it.  Having
> no
> >> > > dependencies on other hbase modules is important for people who want
> >> to
> >> > use
> >> > > the Orderly library for something unrelated to hbase.  For example,
> a
> >> web
> >> > > application that logs data in this format but not directly to hbase.
> >> > >
> >> >
> >> > Orderly has gone unmaintained. The only fork with any activity that
> I'm
> >> > aware of is my own. I'd much rather see it gain the publicity,
> >> > additional scrutiny, wider adoption than continue as a pet-project.
> >> >
> >> > On Fri, Feb 22, 2013 at 9:32 AM, Elliott Clark <eclark@apache.org>
> >> wrote:
> >> > >
> >> > > > Yep the client will be fully separated as soon as rpc changes
> >> > > > are stabilized.  Until then keeping up the move patch was just
too
> >> > > onerous.
> >> > > >
> >> > > >
> >> > > > On Fri, Feb 22, 2013 at 6:31 AM, Jonathan Hsieh <jon@cloudera.com
> >
> >> > > wrote:
> >> > > >
> >> > > > > Nick,
> >> > > > >
> >> > > > > I'm +1 for it having its own module, and being a sibling
of
> >> > > hbase-client.
> >> > > > >  I'm assuming the client stuff will happen before we release
> 0.96
> >> > since
> >> > > > it
> >> > > > > has been started.
> >> > > > >
> >> > > > > Jon.
> >> > > > >
> >> > > > > On Fri, Feb 22, 2013 at 6:13 AM, Nick Dimiduk <
> ndimiduk@gmail.com
> >> >
> >> > > > wrote:
> >> > > > >
> >> > > > > > You're absolutely correct: this library introduces
client-side
> >> > > > > conventions
> >> > > > > > and is not needed from within the HMaster or RegionServer.
Is
> >> > > > > > the consensus that it should reside in it's own module
or be a
> >> > > sibling
> >> > > > to
> >> > > > > > the o.a.h.hbase.client source tree? I'm a little confused
by
> the
> >> > > > current
> >> > > > > > state of the modules; hbase-client looks empty while
> >> > > o.a.h.hbase.client
> >> > > > > > sits under hbase-server.
> >> > > > > >
> >> > > > > > Thanks,
> >> > > > > > Nick
> >> > > > > >
> >> > > > > > On Thu, Feb 21, 2013 at 11:56 PM, Jonathan Hsieh <
> >> jon@cloudera.com
> >> > >
> >> > > > > wrote:
> >> > > > > >
> >> > > > > > > So I buy the argument about this being included
in hbase,
> but
> >> > > several
> >> > > > > of
> >> > > > > > > the questions still stand --
> >> > > > > > >
> >> > > > > > > Why is this part of hbase-common?  shouldn't this
be just a
> >> > > > dependency
> >> > > > > of
> >> > > > > > > hbase-client module?  Does the hbase-server side
need to
> >> depend
> >> > on
> >> > > > > this?
> >> > > > > > >
> >> > > > > > > Since this is a large import of a currently isolated
> library,
> >> why
> >> > > not
> >> > > > > > make
> >> > > > > > > it a separate module instead of part of hbase-common?
 This
> >> would
> >> > > > > > enforce a
> >> > > > > > > boundary that will prevent pollution from circular
> >> dependencies.
> >> > > > > > >
> >> > > > > > > Jon.
> >> > > > > > >
> >> > > > > > > On Thu, Feb 21, 2013 at 7:23 PM, Enis Söztutar
<
> >> enis@apache.org>
> >> > > > > wrote:
> >> > > > > > >
> >> > > > > > > > I think this belongs in core HBase, as a
replacement to
> >> Bytes,
> >> > > > which
> >> > > > > > > should
> >> > > > > > > > be deprecated eventually. We have a Bytes
utility which is
> >> > > supposed
> >> > > > > to
> >> > > > > > > > convert basic java types to byte[]'s, but
it does not work
> >> for
> >> > > > signed
> >> > > > > > > > numbers.
> >> > > > > > > >
> >> > > > > > > > We already know that all of the clients,
Hive, Pig,
> Phoenix,
> >> > have
> >> > > > to
> >> > > > > > have
> >> > > > > > > > at least java type -> byte[] conversion
utilities, and I
> >> think
> >> > it
> >> > > > is
> >> > > > > > > > HBase's job to supply one so that different
clients can
> >> > > > interoperate.
> >> > > > > > > Since
> >> > > > > > > > internally we are also relying on serializing
java types,
> we
> >> > need
> >> > > > > that
> >> > > > > > > > library in the core.
> >> > > > > > > >
> >> > > > > > > > BTW, I also think that we need to have a
SQL-type to java
> >> type
> >> > to
> >> > > > > > byte[]
> >> > > > > > > > layer, but that is another discussion.
> >> > > > > > > >
> >> > > > > > > > Enis
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > On Thu, Feb 21, 2013 at 3:04 PM, Jonathan
Hsieh <
> >> > > jon@cloudera.com>
> >> > > > > > > wrote:
> >> > > > > > > >
> >> > > > > > > > > Nick,
> >> > > > > > > > >
> >> > > > > > > > > While I believe having an order-preserving
canonical
> >> > > > serialization
> >> > > > > > is a
> >> > > > > > > > > good idea,  from doing a read of the
mail and a skim of
> >> the
> >> > > jira
> >> > > > it
> >> > > > > > is
> >> > > > > > > > not
> >> > > > > > > > > clear to my why this is inside hbase
as part of
> >> hbase-common.
> >> > > > > > > > >
> >> > > > > > > > > Why isn't this part of a library on
top of hbase (a
> >> > dependency
> >> > > > for
> >> > > > > > > > > Pig/Hive) instead of "inside" hbase?
> >> > > > > > > > > Can't this functionality be done just
from the client
> >> level?
> >> > > > > > > > > What's the end goal hee? Is the goal
here to replace the
> >> > > > > > > Bytes.toBytes(*)
> >> > > > > > > > > methods to enforced the ordering?
> >> > > > > > > > > If I HBase has two mutually incompatible
encodings
> >> > "built-in",
> >> > > > how
> >> > > > > > > does a
> >> > > > > > > > > dev know to use one or the other later
on?
> >> > > > > > > > > If this is essentially a mega import
of a library
> (300k..
> >> > > yikes)
> >> > > > ,
> >> > > > > > why
> >> > > > > > > > not
> >> > > > > > > > > make it a separate module instead of
part of common?
> >> > > > > > > > >
> >> > > > > > > > > Jon.
> >> > > > > > > > >
> >> > > > > > > > > On Thu, Feb 21, 2013 at 10:35 AM, Nick
Dimiduk <
> >> > > > ndimiduk@gmail.com
> >> > > > > >
> >> > > > > > > > wrote:
> >> > > > > > > > >
> >> > > > > > > > > > Hi everyone,
> >> > > > > > > > > >
> >> > > > > > > > > > I'm of the opinion that HBase should
provide a
> mechanism
> >> > for
> >> > > > > > > > serializing
> >> > > > > > > > > > common java types such that the
serialized format
> sorts
> >> > > > according
> >> > > > > > the
> >> > > > > > > > > > the natural ordering of the type.
I think many
> >> application
> >> > > > > efforts
> >> > > > > > > end
> >> > > > > > > > up
> >> > > > > > > > > > building a custom, partial implementation
of this kind
> >> of
> >> > > > > > > functionality
> >> > > > > > > > > on
> >> > > > > > > > > > their own. I think HBase should
provide a canonical
> >> > > > > implementation
> >> > > > > > of
> >> > > > > > > > > such
> >> > > > > > > > > > a serialization format so that
third-parties can
> >> reliably
> >> > > build
> >> > > > > on
> >> > > > > > > top
> >> > > > > > > > of
> >> > > > > > > > > > HBase. Not just user applications,
but other tools
> like
> >> Pig
> >> > > and
> >> > > > > > Hive
> >> > > > > > > > are
> >> > > > > > > > > > also enabled. Implementations for
> >> > > > > > > > > > HIVE-3634<
> >> https://issues.apache.org/jira/browse/HIVE-3634
> >> > >,
> >> > > > > > > > > > HIVE-2599 <
> >> https://issues.apache.org/jira/browse/HIVE-2599
> >> > >,
> >> > > > or
> >> > > > > > > > > > HIVE-2903<
> >> https://issues.apache.org/jira/browse/HIVE-2903
> >> > > > >could
> >> > > > > be
> >> > > > > > > > > > compatible with similar features
in Pig.
> >> > > > > > > > > >
> >> > > > > > > > > > After implementing something similar
on multiple
> >> occasions,
> >> > > > > > stumbled
> >> > > > > > > > > across
> >> > > > > > > > > > the Orderly <https://github.com/ndimiduk/orderly>
> >> library.
> >> > > > It's
> >> > > > > > also
> >> > > > > > > > > > appears to have been adopted by
other large projects,
> >> > > including
> >> > > > > > > > > > Lily<https://github.com/NGDATA/orderly>.
> >> > > > > > > > > > I've engaged the library's author
for some
> improvements
> >> > only
> >> > > to
> >> > > > > > find
> >> > > > > > > > out
> >> > > > > > > > > > he's now at Google and will no
longer be maintaining
> it.
> >> > > Thus,
> >> > > > I
> >> > > > > > > > propose
> >> > > > > > > > > we
> >> > > > > > > > > > take it into HBase.
> >> > > > > > > > > >
> >> > > > > > > > > > HBASE-7692 <
> >> > https://issues.apache.org/jira/browse/HBASE-7692
> >> > > >
> >> > > > > > > > includes a
> >> > > > > > > > > > patch that introduces Orderly into
hbase-common under
> >> the
> >> > > > orderly
> >> > > > > > > > > > namespace. I have an associated
branch on
> >> > > > > > > > > > gihub<
> >> > > > > > > > >
> >> > > > >
> >> https://github.com/ndimiduk/hbase/commits/7692-ordered-serialization
> >> > > > > > > > > > >wherein
> >> > > > > > > > > > I've broken the patch out into
multiple commits to
> ease
> >> > > review.
> >> > > > > > > > > > Please take a few minutes to give
it a look.
> >> > > > > > > > > >
> >> > > > > > > > > > Thanks,
> >> > > > > > > > > > Nick
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > --
> >> > > > > > > > > // Jonathan Hsieh (shay)
> >> > > > > > > > > // Software Engineer, Cloudera
> >> > > > > > > > > // jon@cloudera.com
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > --
> >> > > > > > > // Jonathan Hsieh (shay)
> >> > > > > > > // Software Engineer, Cloudera
> >> > > > > > > // jon@cloudera.com
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > --
> >> > > > > // Jonathan Hsieh (shay)
> >> > > > > // Software Engineer, Cloudera
> >> > > > > // jon@cloudera.com
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message