pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Dalton <mwdal...@gmail.com>
Subject Re: reading/writing HBase in Pig
Date Tue, 19 Jan 2010 06:14:52 GMT
I took a look at the load-store branch and that definitely seems like the
right place to do this. So the right thing to do would be to just open up a
JIRA and then post a patch against the load-store rewrite tree, correct?
Also, it seems to be that there's no existing support for row keys, which
should also be fixed. The current HBaseStorage assumes that the user passes
a list of columns (i.e. column family/qualifier pairs). However, users may
encode data in the HBase row key as well -- empty row keys are forbidden, so
there is definitely data there.

Doing any sort of StoreFunc implementation of HBase will require row key
support, as each Put must hav ea row key, so it looks like what I'll be
doing is modifying HBaseStorage's LoadFunc support to support row keys in
addition to the existing support for column values, and then adding support
for StoreFunc (with row keys) to HBaseStorage. Just wanted to make sure this
sounds good. Thanks

Best regards,


On Thu, Jan 14, 2010 at 10:40 PM, Dmitriy Ryaboy <dvryaboy@gmail.com> wrote:

> Hi Mike,
> It would be great to have a StoreFunc for HBase!
> There is  a rewrite underway for the Load/Store stuff that will make
> that a lot easier -- see https://issues.apache.org/jira/browse/PIG-966
> .  You may want to consider writing it for the load-store redesign
> branch.  This is what's probably going to be in 0.7. The first step
> would be to open a jira and look at the existing StoreFunc
> implementations.
> -D
> On Thu, Jan 14, 2010 at 9:59 PM, Michael Dalton <mwdalton@gmail.com>
> wrote:
> > Hi all,
> >
> > I was looking at the current Pig code in SVN, and it seems like HBase is
> > supported for loading, but not for storing. If this is the case, I'd like
> to
> > add support for writing to HBase to Pig. Is there anyone else working on
> > this, and if not is this something that you'd like contributed? Based on
> a
> > cursory evaluation of the StoreFunc interface, it looks like the APIs
> there
> > are pretty file-centric and may need to be modified to accomodate HBase's
> > table-based design. For example, you aren't going to be serializing your
> > output to an OutputStream object in all likelihood.
> >
> > I haven't contributed to Pig before, and I wanted to see if this is
> > something that would be beneficial to the rest of the Pig community, and
> if
> > so what next steps I should take (like starting a JIRA) to get the ball
> > rolling. Thanks
> >
> > Best regards,
> >
> > Mike
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message