hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mingjie Lai <mjla...@gmail.com>
Subject Re: Socket Timeout Exception when executing coprocessor
Date Thu, 30 Jun 2011 06:57:30 GMT
Nichole.

Himanshu is right. Your coprocessor at some regions took too long to 
complete which caused the timeout.

Can you do some profiling at the problematic region? Just adding some 
log message in the coprocessor might be okay.

If all coprocessors (one per region) at a RS are performing a full scan 
in parallel, it could consume a lot of IO and cause a timeout. You may 
need to do some optimization. Do you mind to post the code for further 
investigation?

Thanks,
Mingjie

On 06/29/2011 04:08 PM, Himanshu Vashishtha wrote:
> Hey Nichole,
> you are doing something compute intensive at the region level in your
> coprocessors and it is not able to respond back with in the default client
> timeout of 60 sec. Aggregate Coprocessor just do the row count and comes
> back from each region. You can see corresponding exceptions in the region
> server logs too.
>
> Your use case is motivating me to restart work on the streaming result thing
> in a cursor like fashion (3607) from coprocessor result sets; now I have got
> one real user who is facing it! Can you take a look at it and comment if
> that would be of some help in your case.
> I got similar exceptions in my experiments when I did a row count on a 100
> million  record dataset on a 3 RS node cluster. Applying patch of 2077 might
> help, I haven't tried it yet.
>
> Himanshu
>
> On Wed, Jun 29, 2011 at 4:58 PM, Nichole Treadway<kntreadway@gmail.com>wrote:
>
>> Yes...each region has roughly 300,000 rows, 60 million rows total. The rows
>> are large, ~ 20,000 bytes.
>>
>> But in my tests, I'm setting filters such that only 1 KV should be returned
>> to the client. And I'm seeing these issues even when I set start and stop
>> rows on the scanner so that all regions won't be scanned. If i set start
>> and
>> stop rows to cover from 1 to 20 regions, it works fine. Any more regions
>> than that and I see the socket exceptions.
>>
>>
>>
>> On Wed, Jun 29, 2011 at 6:47 PM, Ted Yu<yuzhihong@gmail.com>  wrote:
>>
>>> Can you tell us roughly how large the 200 regions are ?
>>> You can run the rowcounter util.
>>>
>>> Without seeing your (GetListClient) code, I am not able to tell why
>>> AggregationClient didn't have the following issue.
>>>
>>> On Wed, Jun 29, 2011 at 3:24 PM, Nichole Treadway<kntreadway@gmail.com
>>>> wrote:
>>>
>>>> I've implemented my own coprocessor client, protocol and implementation
>>>> that
>>>> returns back to the user a List of KeyValues with values that match
>> some
>>>> criteria. I've tested this on a small table with just a few regions and
>>> it
>>>> works fine. I'm running into issues when I execute my code on a table
>>> with
>>>> 200 regions, and I'm not really sure how to resolve the issue. I'm
>>> getting
>>>> a
>>>> SocketTimeoutException shown below.
>>>>
>>>> I'm able to run the AggregationClient coprocessor without seeing these
>>>> issues. It might be something I'm doing in my code, but if anybody has
>>> any
>>>> ideas why the request seems to be timing out or what I can do about it,
>>> I'd
>>>> appreciate that.
>>>>
>>>>
>>>> I'm running latest revision of hbase-0.92 and hadoop-0.20-append. My
>>>> cluster
>>>> has 15 regionservers. Running on RHEL 5.5, 64-bit.
>>>>
>>>>
>>>> Some highlights of the errors are below...i've put the full thing in
>>>> pastebin here: http://pastebin.com/rapYiNp3
>>>>
>>>>
>>>> 11/06/29 17:35:12 INFO ipc.HBaseRPC: Using
>>>> org.apache.hadoop.hbase.ipc.WritableRpcEngine for
>>>> org.apache.hadoop.hbase.ipc.HRegionInterface
>>>> 11/06/29 17:48:12 WARN
>>> client.HConnectionManager$HConnectionImplementation:
>>>> Error executing for row 00223199610B220970111:2:0:7524::
>>>> java.util.concurrent.ExecutionException:
>>>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
>>> contact
>>>> region server mysite.com/myip:62091 for region
>>>>
>>>>
>>>
>> info_test,00223199610B220970111:2:0:7524::,1309363489443.c3192674341c8d10d84966e8e663a644.,
>>>> row '00223199610B220970111:2:0:7524::', but failed after 10 attempts.
>>>>
>>>> Exceptions:
>>>> java.net.SocketTimeoutException: Call to mysite.com/myip:62091 failed
>> on
>>>> socket timeout exception: java.net.SocketTimeoutException: 60000 millis
>>>> timeout while waiting for channel to be ready for read. ch :
>>>> java.nio.channels.SocketChannel[connected local=/myip:14738 remote=
>>>> mysite.com/myip:62091]
>>>>
>>>> at
>>>>
>>>>
>>>
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1215)
>>>>   at
>>>>
>> org.apache.hadoop.hbase.ipc.ExecRPCInvoker.invoke(ExecRPCInvoker.java:79)
>>>> at $Proxy1.getList(Unknown Source)
>>>>   at
>>>>
>>>>
>>>
>> org.apache.hadoop.hbase.client.coprocessor.GetListClient$1.call(GetListClient.java:108)
>>>> at
>>>>
>>>>
>>>
>> org.apache.hadoop.hbase.client.coprocessor.GetListClient$1.call(GetListClient.java:105)
>>>>   at
>>>>
>>>>
>>>
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$4.call(HConnectionManager.java:1325)
>>>> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>>>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>>> at
>>>>
>>>>
>>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>>>   at
>>>>
>>>>
>>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>>> at java.lang.Thread.run(Thread.java:662)
>>>>
>>>
>>
>

Mime
View raw message