hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bradford Stephens <bradfordsteph...@gmail.com>
Subject Re: HBase Failing on Large Loads
Date Tue, 09 Jun 2009 18:51:12 GMT
I sort of need the reduce since I'm combining primary keys from a CSV
file. Although I guess I could just use the combiner class... hrm.

How do I decrease the batch size?

Also, I tried to make a map-only task that used ImmutableBytesWritable
and BatchUpdate as the output K and V, and TableOutputFormat as the
OutputFormat -- the job fails, saying that "HbaseMapWritable cannot be
cast to org.apache.hadoop.hbase.io.BatchUpdate". I've checked my
Mapper multiple times, it's definitely ouputting a BatchUpdate.

On Tue, Jun 9, 2009 at 10:43 AM, stack<stack@duboce.net> wrote:
> On Tue, Jun 9, 2009 at 10:13 AM, Bradford Stephens <
> bradfordstephens@gmail.com> wrote:
>
>
>> Hey rock stars,
>>
>
>
> Flattery makes us perk up for sure.
>
>
>
>>
>> I'm having problems loading large amounts of data into a table (about
>> 120 GB, 250million rows). My Map task runs fine, but when it comes to
>> reducing, things start burning. 'top' inidcates that I only have ~
>> 100M of RAM free on my datanodes, and every process starts thrashing
>> ... even ssh and ping. Then I start to get errors like:
>>
>> "org.apache.hadoop.hbase.client.RegionOfflineException: region
>> offline: joinedcontent,,1244513452487"
>>
>
> See if said region is actually offline?  Try getting a row from it in shell.
>
>
>
>>
>> and:
>>
>> "Task attempt_200906082135_0001_r_000002_0 failed to report status for
>> 603 seconds. Killing!"
>
>
>
> Sounds like nodes are heavily loaded.. so loaded either the task can't
> report in... or its stuck on an hbase update so long, its taking ten minutes
> or more to return.
>
> One thing to look at is disabling batching or making batches smaller.   When
> batch is big, can take a while under high-load for all row edits to go in.
> HBase client will not return till all row commits have succeeded.  Smaller
> batches will mean more likely to return and not have the task killed because
> takes longer than the report period to checkin.
>
>
> Whats your MR job like?  Your updating hbase in the reduce phase i presume
> (TableOutputFormat?).  Do you need the reduce?  Can you update hbase in the
> map step?   Saves on the sort the MR framework is doing -- a sort that is
> unnecessary given as hbase orders on insertion.
>
>
> Can you try with a lighter load?  Maybe a couple of smaller MR jobs rather
> than one big one?
>
> St.Ack
>
>
>>
>>
>> I'm running Hadoop .19.1 and HBase .19.3, with 1 master/name node and
>> 8 regionservers. 2 x Dual Core Intel 3.2 GHz procs, 4 GB of RAM. 16
>> map tasks, 8 reducers. I've set the MAX_HEAP in hadoop-env to 768, and
>> the one in hbase-env is at its default with 1000. I've also done all
>> the performance enchancements in the Wiki with the file handlers, the
>> garbage collection, and the epoll limits.
>>
>> What am I missing? :)
>>
>> Cheers,
>> Bradford
>>
>

Mime
View raw message