hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bradford Stephens <bradfordsteph...@gmail.com>
Subject Re: HBase Failing on Large Loads
Date Wed, 10 Jun 2009 21:50:29 GMT
OK, I've tried all the optimizations you've suggested (still running
with a M/R job). Still having problems like this:

org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
contact region server 192.168.18.15:60020 for region
joinedcontent,242FEB3ED9BE0D8EF3856E9C4251464C,1244666594390, row
'291DB5C7440B0A5BDB0C12501308C55B', but failed after 10 attempts.
Exceptions:
java.io.IOException: Call to /192.168.18.15:60020 failed on local
exception: java.io.EOFException
java.net.ConnectException: Call to /192.168.18.15:60020 failed on
connection exception: java.net.ConnectException: Connection refused
java.net.ConnectException: Call to /192.168.18.15:60020 failed on
connection exception: java.net.ConnectException: Connection refused
java.net.ConnectException: Call to /192.168.18.15:60020 failed on
connection exception: java.net.ConnectException: Connection refused
java.net.ConnectException: Call to /192.168.18.15:60020 failed on
connection exception: java.net.ConnectException: Connection refused
java.net.ConnectException: Call to /192.168.18.15:60020 failed on
connection exception: java.net.ConnectException: Connection refused
java.net.ConnectException: Call to /192.168.18.15:60020 failed on
connection exception: java.net.ConnectException: Connection refused
java.net.ConnectException: Call to /192.168.18.15:60020 failed on
connection exception: java.net.ConnectException: Connection refused
java.net.ConnectException: Call to /192.168.18.15:60020 failed on
connection exception: java.net.ConnectException: Connection refused
java.net.ConnectException: Call to /192.168.18.15:60020 failed on
connection exception: java.net.ConnectException: Connection refused

On Wed, Jun 10, 2009 at 12:40 AM, stack<stack@duboce.net> wrote:
> On Tue, Jun 9, 2009 at 11:51 AM, Bradford Stephens <
> bradfordstephens@gmail.com> wrote:
>
>> I sort of need the reduce since I'm combining primary keys from a CSV
>> file. Although I guess I could just use the combiner class... hrm.
>>
>> How do I decrease the batch size?
>
>
>
> Below is from hbase-default.xml:
>
>  <property>
>    <name>hbase.client.write.buffer</name>
>    <value>2097152</value>
>    <description>Size of the write buffer in bytes. A bigger buffer takes
> more
>    memory -- on both the client and server side since server instantiates
>    the passed write buffer to process it -- but reduces the number of RPC.
>    For an estimate of server-side memory-used, evaluate
>    hbase.client.write.buffer * hbase.regionserver.handler.count
>    </description>
>  </property>
>
>
> You upped xceivers on your datanodes and you set your
> dfs.datanode.socket.write.timeout = 0?
>
>
>
>> Also, I tried to make a map-only task that used ImmutableBytesWritable
>> and BatchUpdate as the output K and V, and TableOutputFormat as the
>> OutputFormat -- the job fails, saying that "HbaseMapWritable cannot be
>> cast to org.apache.hadoop.hbase.io.BatchUpdate". I've checked my
>> Mapper multiple times, it's definitely ouputting a BatchUpdate.
>>
>
>
> You are using TOF as the map output?  Paste the exception.  You could try
> making a HTable instance in your configure call and then do
> t.commit(BatchUpdate) in your map.  Emit nothing or something simple like an
> integer so the counters when job is done make some kind of sense.
>
> Tell us something about our schema.  How many column families and columns?
>
> St.Ack
>
>
>>
>> On Tue, Jun 9, 2009 at 10:43 AM, stack<stack@duboce.net> wrote:
>> > On Tue, Jun 9, 2009 at 10:13 AM, Bradford Stephens <
>> > bradfordstephens@gmail.com> wrote:
>> >
>> >
>> >> Hey rock stars,
>> >>
>> >
>> >
>> > Flattery makes us perk up for sure.
>> >
>> >
>> >
>> >>
>> >> I'm having problems loading large amounts of data into a table (about
>> >> 120 GB, 250million rows). My Map task runs fine, but when it comes to
>> >> reducing, things start burning. 'top' inidcates that I only have ~
>> >> 100M of RAM free on my datanodes, and every process starts thrashing
>> >> ... even ssh and ping. Then I start to get errors like:
>> >>
>> >> "org.apache.hadoop.hbase.client.RegionOfflineException: region
>> >> offline: joinedcontent,,1244513452487"
>> >>
>> >
>> > See if said region is actually offline?  Try getting a row from it in
>> shell.
>> >
>> >
>> >
>> >>
>> >> and:
>> >>
>> >> "Task attempt_200906082135_0001_r_000002_0 failed to report status for
>> >> 603 seconds. Killing!"
>> >
>> >
>> >
>> > Sounds like nodes are heavily loaded.. so loaded either the task can't
>> > report in... or its stuck on an hbase update so long, its taking ten
>> minutes
>> > or more to return.
>> >
>> > One thing to look at is disabling batching or making batches smaller.
>> When
>> > batch is big, can take a while under high-load for all row edits to go
>> in.
>> > HBase client will not return till all row commits have succeeded.
>>  Smaller
>> > batches will mean more likely to return and not have the task killed
>> because
>> > takes longer than the report period to checkin.
>> >
>> >
>> > Whats your MR job like?  Your updating hbase in the reduce phase i
>> presume
>> > (TableOutputFormat?).  Do you need the reduce?  Can you update hbase in
>> the
>> > map step?   Saves on the sort the MR framework is doing -- a sort that is
>> > unnecessary given as hbase orders on insertion.
>> >
>> >
>> > Can you try with a lighter load?  Maybe a couple of smaller MR jobs
>> rather
>> > than one big one?
>> >
>> > St.Ack
>> >
>> >
>> >>
>> >>
>> >> I'm running Hadoop .19.1 and HBase .19.3, with 1 master/name node and
>> >> 8 regionservers. 2 x Dual Core Intel 3.2 GHz procs, 4 GB of RAM. 16
>> >> map tasks, 8 reducers. I've set the MAX_HEAP in hadoop-env to 768, and
>> >> the one in hbase-env is at its default with 1000. I've also done all
>> >> the performance enchancements in the Wiki with the file handlers, the
>> >> garbage collection, and the epoll limits.
>> >>
>> >> What am I missing? :)
>> >>
>> >> Cheers,
>> >> Bradford
>> >>
>> >
>>
>

Mime
View raw message