hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: HBase always corrupted
Date Wed, 07 Apr 2010 16:57:44 GMT
At StumbleUpon we have north of 20 billions rows, each of 100-200 bytes.

Look in your datanode log for

http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A5

or that

http://wiki.apache.org/hadoop/Hbase/FAQ#A6

J-D

On Wed, Apr 7, 2010 at 9:55 AM, Geoff Hendrey <ghendrey@decarta.com> wrote:
> Hi,
>
> I am running an HBase instance in a pseudocluster mode, on top of a
> pseudoclustered HDFS, on a single machine. I have a 10 node map/reduce
> cluster that is using a TableMapper to drive a map/reduce job. In the
> map phase, two Gets are executed against against HBase. The Map phase
> generates two orders of magnitude more data than was pumped in, and in
> the reduce phase we do some consolidation of the generated data, then
> execute a Put into HBase with autocomit=false, and the batch size set to
> 100,000 (I tried 1000,10000 as well and found 100,000 worked best). I am
> using 32 reducers, and reduce seems to run 1000X slower than mapping.
>
> Unfortunately, the job consistently crashes around 85% reduce
> completion, with HDFS related errors from the HBase machine:
>
> java.io.IOException: java.io.IOException: All datanodes 127.0.0.1:50010
> are bad. Aborting...
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DF
> SClient.java:2525)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.j
> ava:2078)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSCli
> ent.java:2241)
> So I am clearly aware of the mismatch betweem the  big mapreduce
> cluster, and the wimpy HBase installation, but why am I seeing
> consistent crashes? Shouldn't the HBase cluster just be slower, not
> unreliable?
> Here is my main question: should I expect that running a "real" HBase
> cluster will solve my problems and does anyone have experience with a
> map/reduce job that pumps several billion rows into HBase?
> -geoff
>

Mime
View raw message