hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Rawson <ryano...@gmail.com>
Subject Re: Pastebin page - hadoop-hbase communications failure - the hbase*.jar classes apparently not being found by Hadoop
Date Mon, 20 Sep 2010 21:10:51 GMT
Ok that looks good.  Sometimes when you successively build and chain
classpaths you can accidently overwrite the previous ones.  But we are
looking fine here.

What version of java is hadoop running under?  We are compiling our
HBase jars using java6, so that is another source of potential
incompatibilities...

Do you have any custom changes to any of the bin/* scripts in hadoop?

What else can you tell us about your environment?


On Mon, Sep 20, 2010 at 2:00 PM, Taylor, Ronald C <ronald.taylor@pnl.gov> wrote:
>
>
> Found it -
> http://pastebin.com/SfFYSLJy
>
>
> -----Original Message-----
> From: Ryan Rawson [mailto:ryanobjc@gmail.com]
> Sent: Monday, September 20, 2010 1:50 PM
> To: Taylor, Ronald C
> Cc: hbase-user@hadoop.apache.org; user@hbase.apache.org; buttler1@llnl.gov; Ronald Taylor;
Witteveen, Tim
> Subject: Re: Guava*.jar use - hadoop-hbase communications failure - the hbase*.jar classes
apparently not being found by Hadoop
>
> Hey,
>
> yes, the symlink is a pretty good way to be able to inplace upgrade easily.  But still,
normally those other jars are in another subdir so their full path should be:
> /home/hbase/hbase/lib/log4j-1.2.16.jar
>
> the hbase scripts rely on those paths to build the classpath, so dont rearrange the dir
layout too much.
>
> As for the pastebin you will need to send us your direct link, since so many people post
and there isnt really good searching systems, its generally preferred to send the direct link
to your pastebin.  If you ever interact with us on IRC this is also how we get big dumps
done as well.
>
> Thanks!
> -ryan
>
> On Mon, Sep 20, 2010 at 1:38 PM, Taylor, Ronald C <ronald.taylor@pnl.gov> wrote:
>> Ryan,
>>
>> The hbase*.jar is in the root hbase directory (at /home/hbase/hbase). Now, that is
symbolic link on all the nodes (as you can see below), but that should not matter, right?
>>
>> lor@h01 hbase]$ pwd
>> /home/hbase
>> [rtaylor@h01 hbase]$ ls -l
>> lrwxrwxrwx  1 root  hadoop    19 Aug 26 08:47 hbase ->
>> hbase-0.89.20100726 drwxr-xr-x  9 hbase hadoop  4096 Sep 18 22:54
>> hbase-0.89.20100726
>> [rtaylor@h01 hbase]$
>>
>>
>> Anyhoo, I just put the hbase-env.sh file on pastebin.com. Please take a look. I posted
it under the title:
>>
>> "Ronald Taylor / hadoop-env.sh file - HBase-Hadoop hbase*.jar problem"
>>
>> This is the first time I've used pastebin.com, so hopefully I uploaded properly.
Please let me know if not.
>>
>> I don't think I mispelled anything on the HADOOP_CLASSPATH line (I just verified
file existence based on those spellings, see below "ls" listings), but very happy to have
an expert take a look.
>>  Ron
>>
>>
>> export
>> HADOOP_CLASSPATH=/home/hbase/hbase/conf:/home/hbase/hbase/hbase-0.89.2
>> 0100726.jar:/home/hbase/hbase/log4j-1.2.16.jar:/home/hbase/hbase/zooke
>> eper-3.3.1.jar
>>
>>
>> [rtaylor@h01 conf]$ ls /home/hbase/hbase/conf
>> hadoop-metrics.properties  hbase-default_with_RT_mods.xml
>> hbase-env.sh    hbase-site.xml.psuedo-distributed.template
>> regionservers hbase-default_ORIG.xml     hbase-default.xml
>> hbase-site.xml  log4j.properties                            tohtml.xsl
>>
>> [rtaylor@h01 conf]$ ls /home/hbase/hbase/hbase-0.89.20100726.jar
>> /home/hbase/hbase/hbase-0.89.20100726.jar
>> [rtaylor@h01 conf]$
>>
>> [rtaylor@h01 conf]$ ls /home/hbase/hbase/log4j-1.2.16.jar
>> /home/hbase/hbase/log4j-1.2.16.jar
>> [rtaylor@h01 conf]$
>>
>> [rtaylor@h01 conf]$ ls /home/hbase/hbase/zookeeper-3.3.1.jar
>> /home/hbase/hbase/zookeeper-3.3.1.jar
>> [rtaylor@h01 conf]$
>>
>>
>>
>> -----Original Message-----
>> From: Ryan Rawson [mailto:ryanobjc@gmail.com]
>> Sent: Monday, September 20, 2010 1:17 PM
>> To: Taylor, Ronald C
>> Cc: user@hbase.apache.org; hbase-user@hadoop.apache.org;
>> buttler1@llnl.gov; Ronald Taylor; Witteveen, Tim
>> Subject: Re: Guava*.jar use - hadoop-hbase communications failure -
>> the hbase*.jar classes apparently not being found by Hadoop
>>
>> Hey,
>>
>> If you could, perhaps you could paste up your hadoop-env.sh on pastebin.com?  That
would help... sometimes I have made errors in the bash shell trickery, and it probably would
help to get more eyes checking it out.
>>
>> Normally in the stock hbase distro the Hbase JAR is in the root hbase dir, and the
other jars in the lib/ sub directory, am I correct to assume you've moved the jars around
a bit?
>>
>> Good luck,
>> -ryan
>>
>> On Mon, Sep 20, 2010 at 1:14 PM, Taylor, Ronald C <ronald.taylor@pnl.gov> wrote:
>>>
>>> Hello Ryan, Dave, other developers,
>>>
>>> Have not fixed the problem. Here's where things stand:
>>>
>>> 1) As Ryan suggested, we have checked all the nodes to make sure that we copied
over the hadoop-env.sh file with the HADOOP_CLASSPATH setting, set like so:
>>>
>>> export
>>> HADOOP_CLASSPATH=/home/hbase/hbase/conf:/home/hbase/hbase/hbase-0.89.
>>> 2
>>> 0100726.jar:
>>> /home/hbase/hbase/log4j-1.2.16.jar:/home/hbase/hbase/zookeeper-3.3.1.
>>> j
>>> ar
>>>
>>> Answer: yep, that was OK, the files are there. We also restarted Hadoop and Hbase
again. No change - program still fails on not finding  the TableOutputFormat class.
>>>
>>> 2) Following Dave's advice of avoiding the problem by not using TableOutputFormat
(by skipping the Reducer stage), I tried a variant of that. I kept the Reducer stage in, but
changed it to output to a file, instead of an Hbase table.
>>>
>>> That did not work either. I tried running the new program from the hadoop acct
and now get a msg (from the Mapper stage, I believe) saying that the hbase.mapreduce.TableMapper
class cannot be found. So - it is not just TableOutputFormat class - it is all the classes
in the hbase*.jar file that are not being found.
>>>
>>> Does this have anything to do with the guava*.jar file that Ryan mentioned, which
(as far as I can tell) we don't have installed?
>>>
>>> Obviously, we need more help.
>>>
>>> In the meantime, as a stop-gap, I'm planning on writing our analysis programs
this way:
>>>
>>> 1) extract data from the source Hbase table and store in an HDFS
>>> file, all data needed for analysis contained independently on each
>>> row - this task to be done by a non-MapReduce class that can access
>>> Hbase tables
>>>
>>> 2) call an MapReduce class that will process the file in parallel and
>>> return an new file (well, a directory of files which I'll combine
>>> into
>>> one) as output
>>>
>>> 3) write the contents of the new results file back into an Hbase
>>> table using another non-MapReduce class
>>>
>>> I presume this will work, but again, obviously, it's not optimal and we need
to resolve this issue so MapReduce classes can access Hbase tables directly on our cluster.
>>>
>>> Does anybody have any advice?
>>>  Cheers,
>>>   Ron
>>>
>>> ___________________________________________
>>> Ronald Taylor, Ph.D.
>>> Computational Biology & Bioinformatics Group Pacific Northwest
>>> National Laboratory
>>> 902 Battelle Boulevard
>>> P.O. Box 999, Mail Stop J4-33
>>> Richland, WA  99352 USA
>>> Office:  509-372-6568
>>> Email: ronald.taylor@pnl.gov
>>>
>>>
>>> -----Original Message-----
>>> From: Buttler, David [mailto:buttler1@llnl.gov]
>>> Sent: Monday, September 20, 2010 10:17 AM
>>> To: user@hbase.apache.org; 'hbase-user@hadoop.apache.org'
>>> Subject: RE: hadoop-hbase failure - could use some help, a class is
>>> apparently not being found by Hadoop
>>>
>>> I find it is often faster to skip the reduce phase when updating rows in hbase.
 (A trick I picked up from Ryan) Essentially, you read a row from hbase, do your processing,
and write the row back to hbase.
>>> The only time you would want to do the reduce phase is if there is some aggregation
that you need, or if there is some output you want to skip (e.g. you have a zipfian distribution
and you want to ignore the low count occurrences).
>>>
>>> Dave
>>>
>>> -----Original Message-----
>>> From: Taylor, Ronald C
>>> Sent: Sunday, September 19, 2010 9:59 PM
>>> To: 'Ryan Rawson'; user@hbase.apache.org;
>>> hbase-user@hadoop.apache.org
>>> Cc: Taylor, Ronald C; 'Ronald Taylor'; Witteveen, Tim
>>> Subject: RE: Guava*.jar use - hadoop-hbase failure - could use some
>>> help, a class is apparently not being found by Hadoop
>>>
>>>
>>> Ryan,
>>>
>>> Thanks for the quick feedback. I will check the other nodes on the cluster to
see if they have been properly updated.
>>>
>>> However, I am now really confused as to use of the guava*.jar file that you talk
about. This is the first time I've heard about this. I presume we are talking about a jar
file packaging the guava libraries from Google?
>>>
>>> I cannot find this guava*.jar in either the /home/hadoop/hadoop directory or
in the /home/hbase/hbase directories, where the Hadoop and Hbase installs place the other
*.jar files. I'm afraid that I don't even know where we should have downloaded it. Does it
come with Hbase, or with Hadoop? Where should it have been placed, after installation? Should
I now download it - since we appear to be missing it - from here?
>>>  http://code.google.com/p/guava-libraries/downloads/list
>>>
>>> I Googled and found issue HBASE-2714 (Remove Guava as a client
>>> dependency, June 11 2010) here
>>>
>>> http://www.mail-archive.com/issues@hbase.apache.org/msg00950.html
>>> (see below, where I've included the text)
>>>
>>> which appears to say that Hbase (at least *some* release of Hbase -
>>> does this include 0.89?) has a dependency on Guava, in order to run a
>>> MapReduce job over Hbase. But nothing on Guava is mentioned at
>>>
>>>
>>> http://hbase.apache.org/docs/r0.20.4/api/org/apache/hadoop/hbase/mapr
>>> e
>>> duce/package-summary.html#classpath
>>>
>>> (I cannot find anything in the Hbase 0.89 online documents on Guava
>>> or in how to set CLASSPATH or in what *.jar files to include so I can
>>> use MapReduce with Hbase; the best guidance I can find is in this
>>> earlier
>>> document.)
>>>
>>> So - I could really use further clarification in regard to Guava as to what I
should be doing to set up Hbase-MapReduce work.
>>>
>>>  Regards,
>>>   Ron
>>>
>>> %%%%%%%%%%%%%%%%%%%%%%%%
>>>
>>> From
>>>
>>> http://www.mail-archive.com/issues@hbase.apache.org/msg00950.html
>>>
>>>
>>> Todd Lipcon commented on HBASE-2714:
>>> ------------------------------------
>>>
>>> Why not?
>>>
>>> In theory, the new TableMapReduceUtil.addDependencyJars should take care of shipping
it in the distributedcache. Apparently it's not working?
>>>
>>> ryan rawson commented on HBASE-2714:
>>> ------------------------------------
>>>
>>> not everyone uses that mechanism to run map reduce jobs on hbase.  The standard
for a long time was to add hbase.jar and zookeeper-3.2.2.jar to the hadoop classpath, thus
not requiring every job include the hbase jars.
>>>
>>> Todd Lipcon commented on HBASE-2714:
>>> ------------------------------------
>>>
>>> Does this mean in general that we can't add more dependencies to the
>>> hbase client? I think instead we should make it easier to run hbase
>>> MR jobs *without* touching the Hadoop config (eg right now you have
>>> to restart MR to upgrade hbase, that's not going to fly for a lot of
>>> clusters)
>>>
>>> stack commented on HBASE-2714:
>>> ------------------------------
>>>
>>> So, we need to change our recommendations here:
>>> http://hbase.apache.org/docs/r0.20.4/api/org/apache/hadoop/hbase/mapreduce/package-summary.html#classpath?
>>>
>>>
>>>> Remove Guava as a client dependency
>>>> -----------------------------------
>>>>
>>>>                 Key: HBASE-2714
>>>>                 URL:
>>>> https://issues.apache.org/jira/browse/HBASE-2714
>>>>             Project: HBase
>>>>          Issue Type: Improvement
>>>>          Components: client
>>>>            Reporter: Jeff Hammerbacher
>>>>
>>>> We shouldn't need Guava on the classpath to run a MapReduce job over HBase.
>>>
>>>
>>> %%%%%%%%%%%%%%%%%%%%%%%%
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Ryan Rawson [mailto:ryanobjc@gmail.com]
>>> Sent: Sunday, September 19, 2010 12:45 AM
>>> To: user@hbase.apache.org
>>> Cc: hbase-user@hadoop.apache.org; Taylor, Ronald C
>>> Subject: Re: hadoop-hbase failure - could use some help, a class is
>>> apparently not being found by Hadoop
>>>
>>> hey,
>>>
>>> looks like you've done all the right things... you might want to double check
that all the 'slave' machines have the updated hadoop-env.sh and that the path referenced
therein is present _on all the machines_.
>>>
>>> You also need to include the guava*.jar as well.  the log4j is already included
by mapred by default, so no need there.
>>>
>>> -ryan
>>>
>>>
>>>
>>> On Fri, Sep 17, 2010 at 4:19 PM, Taylor, Ronald C <ronald.taylor@pnl.gov>
wrote:
>>>>
>>>> Hi folks,
>>>>
>>>> Got a problem in basic Hadoop-Hbase communication. My small test
>>>> program ProteinCounter1.java - shown in full below - reports out
>>>> this error
>>>>
>>>>   java.lang.RuntimeException: java.lang.ClassNotFoundException:
>>>> org.apache.hadoop.hbase.mapreduce.TableOutputFormat
>>>>        at
>>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809
>>>> )
>>>>
>>>> The full invocation and error msgs are shown at bottom.
>>>>
>>>> We are using Hadoop 20.2 with HBase0.89.2010726 on a 24-node cluster. Hadoop
and Hbase each appears to work fine separately. That is, I've created programs that run MapReduce
on files, and programs that import data into Hbase tables and manipulate such. Both types
of programs have gone quite smoothly.
>>>>
>>>> Now I want to combine the two - use MapReduce programs on data drawn from
an Hbase table, with results placed back into an Hbase table.
>>>>
>>>> But my test program for such, as you see from the error msg, is not
>>>> working. Apparently the
>>>>   org.apache.hadoop.hbase.mapreduce.TableOutputFormat
>>>>  class is not found.
>>>>
>>>> However, I have added these paths, including the relevant Hbase *.jar, to
HADOOP_CLASSPATH, so the missing class should have been found, as you can see:
>>>>
>>>>  export HADOOP_CLASSPATH=/home/hbase/hbase/conf:
>>>> /home/hbase/hbase/hbase-0.89.20100726.jar:
>>>> /home/rtaylor/HadoopWork/log4j-1.2.16.jar:
>>>> /home/rtaylor/HadoopWork/zookeeper-3.3.1.jar
>>>>
>>>>  This change was made in the ../hadoop/conf/hadoop-env.sh file.
>>>>
>>>> I checked the manifest of /home/hbase/hbase/hbase-0.89.20100726.jar
>>>> and
>>>>    org/apache/hadoop/hbase/mapreduce/TableOutputFormat.class
>>>>  is indeed present that Hbase *.jar file.
>>>>
>>>> Also, I have restarted both Hbase and Hadoop after making this change.
>>>>
>>>> Don't understand why the TableOutputFormat class is not being found. Or is
the error msg misleading, and something else is going wrong? I would very much appreciate
any advice people have as to what is going wrong. Need to get this working very soon.
>>>>
>>>>   Regards,
>>>>     Ron T.
>>>>
>>>> ___________________________________________
>>>> Ronald Taylor, Ph.D.
>>>> Computational Biology & Bioinformatics Group Pacific Northwest
>>>> National Laboratory
>>>> 902 Battelle Boulevard
>>>> P.O. Box 999, Mail Stop J4-33
>>>> Richland, WA  99352 USA
>>>> Office:  509-372-6568
>>>> Email: ronald.taylor@pnl.gov
>>>>
>>>>
>>>> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>>>> %
>>>> %
>>>> %%%%%%%%%%%%
>>>>
>>>> contents of the "ProteinCounter1.java" file:
>>>>
>>>>
>>>>
>>>> //  to compile
>>>> // javac ProteinCounter1.java
>>>> // jar cf ProteinCounterTest.jar  *.class
>>>>
>>>> // to run
>>>> //   hadoop jar ProteinCounterTest.jar ProteinCounter1
>>>>
>>>>
>>>> import org.apache.hadoop.hbase.HBaseConfiguration;
>>>> import org.apache.hadoop.conf.Configuration;
>>>> import org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter;
>>>> import org.apache.hadoop.mapreduce.Job; import
>>>> org.apache.hadoop.io.IntWritable;
>>>>
>>>> import java.util.*;
>>>> import java.io.*;
>>>> import org.apache.hadoop.hbase.*;
>>>> import org.apache.hadoop.hbase.client.*; import
>>>> org.apache.hadoop.hbase.io.*; import org.apache.hadoop.hbase.util.*;
>>>> import org.apache.hadoop.hbase.mapreduce.*;
>>>>
>>>>
>>>> // %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>>>>
>>>> /**
>>>>  * counts the number of times each protein appears in the
>>>> proteinTable
>>>>  *
>>>>  */
>>>> public class ProteinCounter1 {
>>>>
>>>>
>>>>    static class ProteinMapper1 extends
>>>> TableMapper<ImmutableBytesWritable, IntWritable> {
>>>>
>>>>        private int numRecords = 0;
>>>>        private static final IntWritable one = new IntWritable(1);
>>>>
>>>>        @Override
>>>>            public void map(ImmutableBytesWritable row, Result
>>>> values, Context context) throws IOException {
>>>>
>>>>            // retrieve the value of proteinID, which is the row key
>>>> for each protein in the proteinTable
>>>>            ImmutableBytesWritable proteinID_Key = new
>>>> ImmutableBytesWritable(row.get());
>>>>            try {
>>>>                context.write(proteinID_Key, one);
>>>>            } catch (InterruptedException e) {
>>>>                throw new IOException(e);
>>>>            }
>>>>            numRecords++;
>>>>            if ((numRecords % 100) == 0) {
>>>>                context.setStatus("mapper processed " + numRecords
+ "
>>>> proteinTable records so far");
>>>>            }
>>>>        }
>>>>    }
>>>>
>>>>    public static class ProteinReducer1 extends
>>>> TableReducer<ImmutableBytesWritable,
>>>>                                               IntWritable,
>>>> ImmutableBytesWritable> {
>>>>
>>>>        public void reduce(ImmutableBytesWritable proteinID_key,
>>>> Iterable<IntWritable> values,
>>>>                            Context context)
>>>>            throws IOException, InterruptedException {
>>>>            int sum = 0;
>>>>            for (IntWritable val : values) {
>>>>                sum += val.get();
>>>>            }
>>>>
>>>>            Put put = new Put(proteinID_key.get());
>>>>            put.add(Bytes.toBytes("resultFields"),
>>>> Bytes.toBytes("total"), Bytes.toBytes(sum));
>>>>            System.out.println(String.format("stats : proteinID_key
:
>>>> %d, count : %d",
>>>>
>>>> Bytes.toInt(proteinID_key.get()), sum));
>>>>            context.write(proteinID_key, put);
>>>>        }
>>>>    }
>>>>
>>>>    public static void main(String[] args) throws Exception {
>>>>
>>>>        org.apache.hadoop.conf.Configuration conf;
>>>>           conf =
>>>> org.apache.hadoop.hbase.HBaseConfiguration.create();
>>>>
>>>>        Job job = new Job(conf, "HBaseTest_Using_ProteinCounter");
>>>>        job.setJarByClass(ProteinCounter1.class);
>>>>
>>>>        org.apache.hadoop.hbase.client.Scan scan = new Scan();
>>>>
>>>>        String colFamilyToUse = "proteinFields";
>>>>        String fieldToUse = "Protein_Ref_ID";
>>>>
>>>>        // retreive this one column from the specified family
>>>>        scan.addColumn(Bytes.toBytes(colFamilyToUse),
>>>> Bytes.toBytes(fieldToUse));
>>>>
>>>>           org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter
>>>> filterToUse =
>>>>                 new
>>>> org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter();
>>>>        scan.setFilter(filterToUse);
>>>>
>>>>        TableMapReduceUtil.initTableMapperJob("proteinTable", scan,
>>>> ProteinMapper1.class,
>>>>                              ImmutableBytesWritable.class,
>>>>                                              IntWritable.class,
>>>> job);
>>>>        TableMapReduceUtil.initTableReducerJob("testTable",
>>>> ProteinReducer1.class, job);
>>>>        System.exit(job.waitForCompletion(true) ? 0 : 1);
>>>>    }
>>>> }
>>>>
>>>>
>>>> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>>>>
>

Mime
View raw message