hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Pinckernell <...@illx.org>
Subject Re: Pig load to HBase not invoking coprocessor
Date Sun, 25 Mar 2012 08:35:11 GMT
Thank you!  That got me in the right direction.  Yes, my region observer
overrides prePut().

Here is what I found out through debugging the region server:
When using the HBase client API the Put has the correct KeyValue timestamp
(which matches up its Mutation 'ts')
but when using Pig to load it, the timestamps do not match up, thus the
Put.has() method [line 255] does not return true on line 273 from the
following check:

        if (Arrays.equals(kv.getFamily(), family) &&
Arrays.equals(kv.getQualifier(), qualifier)
            && kv.getTimestamp() == ts) {

failing on 'kv.getTimestamp() == ts'

I'm not yet sure why the KeyValue timestamp (gotten from
KeyValue.getTimestamp()) is being set incorrectly from the pig load.

On Sat, Mar 24, 2012 at 3:37 PM, Ted Yu <yuzhihong@gmail.com> wrote:

> hbase.mapreduce.TableOutputFormat is used by HBaseStorage
> The Put reaches region server and ends up in HRegion.doMiniBatchPut() where
> I see:
>
>    if (coprocessorHost != null) {
>      for (int i = 0; i < batchOp.operations.length; i++) {
>        Pair<Put, Integer> nextPair = batchOp.operations[i];
>        Put put = nextPair.getFirst();
>        if (coprocessorHost.prePut(put, walEdit, put.getWriteToWAL())) {
>
> Was your code in prePut() ?
>
> Cheers
>
> On Sat, Mar 24, 2012 at 11:19 AM, Nick Pinckernell <nap@illx.org> wrote:
>
> > Hi, I posted this over at the pig forums and Dmitriy suggested I ask on
> the
> > hbase list as well (original post here:
> >
> >
> http://mail-archives.apache.org/mod_mbox/pig-user/201203.mbox/ajax/%3CCABsY1jQFaiw%3Dbirw3ZukmdwKmY6EV9z75%2BxSTU_%2BmZsyBwsB2A%40mail.gmail.com%3E
> > )
> >
> > I'm having a possible issue with a simple pig load that writes to an
> HBase
> > table.  The issue is that when I run the test pig script it does not
> invoke
> > the region observer coprocessor on the table.  I have verified that my
> > coprocessor executes when I use the HBase client API to do a simple
> put().
> >
> > Simple pig script is as follows (test.pig):
> > register /dev/hbase-0.92.0/hbase-0.92.0.jar;
> > register /dev/hbase-0.92.0/lib/zookeeper-3.4.2.jar;
> > register /dev/hbase-0.92.0/lib/guava-r09.jar;
> > A = load '/tmp/testdata.csv' using PigStorage(',');
> > store A into 'hbase://test' using
> > org.apache.pig.backend.hadoop.hbase.HBaseStorage ('f:t');
> >
> > Using the following environment variables and command:
> > export HADOOP_HOME=/dev/hadoop-1.0.0
> > export PIG_CLASSPATH=/dev/hadoop-1.0.0/conf
> > export HBASE_HOME=/dev/hbase-0.92.0/
> > export PIG_CLASSPATH="`${HBASE_HOME}/bin/hbase classpath`:$PIG_CLASSPATH"
> > /dev/pig-0.9.2/bin/pig -x local -f test.pig
> >
> > I have also tried 'pig -x mapreduce' and it still does not seem to invoke
> > the coprocessor.  After looking through the HBaseStorage class it appears
> > that the RecordWriter is getting HBase Put objects and that ultimately
> > those are getting flushed so I'm not sure why the coprocessor is not
> > executing.
> >
> > Is this by design, or am I missing something about how the output from
> the
> > pig job is being loaded into the HBase table?
> > Thank you
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message