Mailing-List: contact user-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hive.apache.org
Received-SPF: pass (athena.apache.org: domain of edlinuxguru@gmail.com
 designates 209.85.161.48 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :cc:content-type:content-transfer-encoding;
        b=PLVBOPTLVaCb/f7j8jmTDaL0bjYIWDDgWtKNqEbgeWAIXvLrWqq59mITUeSssAqE83
         SLZ6sJRBBAHgkweGJXjI32i9bFfbKnQsWroq7qpoQN7WWwn03yOFy1ETu5qw7jt/5JPz
         dXniSqqIQJ2cQyDLphj6LBlVMNiYbVhGC+FFQ=
MIME-Version: 1.0
In-Reply-To: <AANLkTimJwBWC8M4yq+PV+NBJTj+Vg3zgywA_R5akzE0+@mail.gmail.com>
References: <AANLkTimJwBWC8M4yq+PV+NBJTj+Vg3zgywA_R5akzE0+@mail.gmail.com>
Date: Wed, 16 Mar 2011 12:59:46 -0400
Message-ID: <AANLkTimt87oTuTMiKY22RC8gJNuH3px41X3k5wawezdR@mail.gmail.com>
Subject: Re: Problem with Hive HBase Integration - Running Mapper task
From: Edward Capriolo <edlinuxguru@gmail.com>
To: user@hive.apache.org
Cc: Abhijit Sharma <abhijit.sharma@gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On Wed, Mar 16, 2011 at 12:51 PM, Abhijit Sharma
<abhijit.sharma@gmail.com> wrote:
> Hi,
> I am trying to connect the hive shell running on my laptop to a remote
> hadoop / hbase cluster and test out the HBase/Hive integration. I manage =
to
> connect and create the table in hbase from remote Hive shell. I am also
> passing the auxpath parameter to the shell (specifying the Hive/HBase
> integration related jars). In addition I have copied over these files to
> HDFS as well (I am using the user name hadoop - so the jars are stored in
> HDFS under /user/hadoop).
> However when =A0I fire a query on the HBase table - select * from h1 wher=
e
> key=3D12; - the map reduce job launches but the map task fails with the
> following error:
> ----
>
> java.io.IOException: Cannot create an instance of InputSplit class =3D
> org.apache.hadoop.hive.hbase.HBaseSplit:org.apache.hadoop.hive.hbase.HBas=
eSplit
> 	at
> org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(Hi=
veInputFormat.java:143)
> 	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:333)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
> ----
> This basically indicates that the Mapper task is unable to locate the
> Hive/HBase storage handler that it requires when running. This happens ev=
en
> though this has been specified in the auxpath and uploaded to HDFS.
> Any ideas/pointers/debug options on what I might be doing wrong? Any help=
 is
> much appreciated.
> p.s. the exploded jars do get copied too under the taskTracker directory =
on
> the cluster node
> Thanks

I have seen this error. This is oddness between hadoop,hive, and
map/reduce classpaths.

This is what I do
mkdir hive_home/auxlib
cp all hive and hbase jars here.
Also copy the hbase handler jar to auxlib.

Auxlib get pushed out by the distributed cache each job and you do not
need to use ADD_JAR XXXX;

But that is not enough! DOH! Planning the job and getting the splits
happen before the map tasks are launched.

For this i drop all the hbase libs in hadoop_home/lib  only on the
machine that is launching the job.

You can fiddle around with HADOOP_CLASSPATH and achieve similar results.

Good luck.