Mailing-List: contact hadoop-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hadoop-user@lucene.apache.org
Received-SPF: pass (nike.apache.org: local policy)
Message-ID: <47349718.30407@duboce.net>
Date: Fri, 09 Nov 2007 09:21:28 -0800
From: Michael Stack <stack@duboce.net>
User-Agent: Thunderbird 2.0.0.6 (Macintosh/20070728)
MIME-Version: 1.0
To: hadoop-user@lucene.apache.org
Subject: Re: Excpetion when combining Hadoop MapReduce with HBase
References: <7c962aed0711082056o7178863qdec2f649042e84c@mail.gmail.com>
 <47347D5F.5010007@gmail.com>
In-Reply-To: <47347D5F.5010007@gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Lets use HADOOP-2179 for figuring out whats going on Holger.  Would you 
mind uploading your configuration files -- both hadoop and hbase -- and 
put a link there to the data you are using so I can duplicate your setup 
locally?

Thanks,
St.Ack


Holger Stenzhorn wrote:
> Hi,
>
> During my latest "experiments" I was always using the local mode for 
> Hadoop/Hbase.
> In the following I just give you the respective Reducer classes that I 
> use for testing.
> Employing the...
> - "TestFileReducer" the MapReduce job completes flawlessly.
> - "TestBaseReducer" the job crashes (see log below and attached with 
> DEBUG turned on)
>
> Cheers,
> Holger
>
> Code:
> ------
>
>  public static class TestFileReducer extends MapReduceBase
>    implements Reducer<Text, Text, Text, Text> {
>      public void reduce(Text key, Iterator<Text> values,
>                       OutputCollector<Text, Text> output,
>                       Reporter reporter) throws IOException {        
> StringBuilder builder = new StringBuilder();
>      while (values.hasNext()) {
>        builder.append(values.next() + "\n");
>      }
>      output.collect(key, new Text(builder.toString()));
>    }
>  }
>
>  public static class TestBaseReducer extends MapReduceBase
>    implements Reducer<Text, Text, Text, MapWritable> {
>      public void reduce(Text key, Iterator<Text> values,
>                       OutputCollector<Text, MapWritable> output,
>                       Reporter reporter) throws IOException {
>      StringBuilder builder = new StringBuilder();
>      while (values.hasNext()) {
>        builder.append(values.next() + "\n");
>      }
>      MapWritable value = new MapWritable();
>      value.put(new Text("triples:" + string.hashCode()), new 
> ImmutableBytesWritable(builder.toString().getBytes()));          
> output.collect(key, value);
>    }
>  }
>
>
>
> Test case log:
> --------------
>
> 07/11/09 15:17:22 INFO mapred.JobClient:  map 100% reduce 83%
> 07/11/09 15:17:25 INFO mapred.LocalJobRunner: reduce > reduce
> 07/11/09 15:17:28 INFO mapred.LocalJobRunner: reduce > reduce
> 07/11/09 15:17:31 INFO mapred.LocalJobRunner: reduce > reduce
> 07/11/09 15:17:34 INFO mapred.LocalJobRunner: reduce > reduce
> 07/11/09 15:17:37 INFO mapred.LocalJobRunner: reduce > reduce
> 07/11/09 15:22:24 WARN mapred.LocalJobRunner: job_local_1
> java.net.SocketTimeoutException: timed out waiting for rpc response
>        at org.apache.hadoop.ipc.Client.call(Client.java:484)
>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:184)
>        at $Proxy1.batchUpdate(Unknown Source)
>        at org.apache.hadoop.hbase.HTable.commit(HTable.java:724)
>        at org.apache.hadoop.hbase.HTable.commit(HTable.java:701)
>        at 
> org.apache.hadoop.hbase.mapred.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:89) 
>
>        at 
> org.apache.hadoop.hbase.mapred.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:63) 
>
>        at 
> org.apache.hadoop.mapred.ReduceTask$2.collect(ReduceTask.java:308)
>        at TriplesTest$TestBaseReducer.reduce(TriplesTest.java:78)
>        at TriplesTest$TestBaseReducer.reduce(TriplesTest.java:52)
>        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:326)
>        at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:164)
> Exception in thread "main" java.io.IOException: Job failed!
>        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:831)
>        at TriplesTest.run(TriplesTest.java:181)
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>        at TriplesTest.main(TriplesTest.java:187)
>
>
> Master log:
> -----------
>
> Attached as GZIP...
>
> Cheers,
> Holger
>
> stack wrote:
>> Holger Stenzhorn wrote:
>>  
>>> Since I am just a beginner at Hadoop and Hbase I cannot really tell 
>>> whether an exception is a "real" one or just a "hint" - but 
>>> exceptions always look scary... :-)
>>>     
>> Yeah.  You might add your POV to the issue.
>>
>>  
>>> Anyways, when I did cut the allowed heap for the server and the test 
>>> class both down to 2GB. In this case I just get following (not too 
>>> optimistic) results...
>>> Well, I post all the log I think that might be necessary since I 
>>> cannot say which exception is important and which one not.
>>>     
>> Looks like your poor old mapreduce job failed when it tried to write a
>> record to hbase.
>>
>> Looking at the master log, whats odd is that the catalog .META. table
>> has no mention of the 'triples' table.  Its been created?  (It may not
>> be showing because you are not running with logging at DEBUG level.
>> As of yesterday or so you can set DEBUG level from the UI by browsing
>> to 'Log Level' servlet at http://MASTER_HOST:PORT -- usually
>> http://localhost:60000/ -- and set the package
>> 'org.apache.hadoop.hbase' to DEBUG).
>>
>> But then you OOME.
>>
>> You might try outputting back to local filesystem rather than to HDFS
>> -- use something like TextOutputFormat instead of TableOutputFormat.
>> If that works, then there is an issue w/ writing output to hbase.
>> Please open a JIRA, paste your MR program and lets figure a way to get
>> the data file across.
>>
>> Thanks for your patience H,
>> St.Ack
>>   
>