hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mayuresh <mayuresh.kshirsa...@gmail.com>
Subject Re: Coprocessor execution - HBaseServer: (responseTooSlow) java.nio.channels.ClosedChannelException
Date Mon, 26 Sep 2011 10:22:21 GMT
I increased the leases in hbase-site.xml to around 50 minutes:

  <property>
    <name>hbase.regionserver.lease.period</name>
    <!--value>60000</value-->
    <value>3000000</value>
    <description>HRegion server lease period in milliseconds. Default is
    60 seconds. Clients must report in within this period else they are
    considered dead.</description>
  </property>

However I still fail with the same error:

2011-09-26 15:50:28,857 WARN org.apache.hadoop.ipc.HBaseServer:
(responseTooSlow):
{"processingtimems":118696,"call":"execCoprocessor([B@5b0e6f59,
getAvg(org.apache.hadoop.hbase.client.coprocessor.LongColumnInterpreter@10b06ac3,
{\"timeRange\":[0,9223372036854775807],\"batch\":-1,\"startRow\":\"\",\"stopRow\":\"\",\"totalColumns\":0,\"cacheBlocks\":true,\"families\":{\"data\":[]},\"maxVersions\":1,\"caching\":-1}))","client":"137.72.240.180:54843","starttimems":1317032310155,"queuetimems":9,"class":"HRegionServer","responsesize":0,"method":"execCoprocessor"}
2011-09-26 15:50:28,872 WARN org.apache.hadoop.ipc.HBaseServer: IPC
Server Responder, call execCoprocessor([B@5b0e6f59,
getAvg(org.apache.hadoop.hbase.client.coprocessor.LongColumnInterpreter@10b06ac3,
{"timeRange":[0,9223372036854775807],"batch":-1,"startRow":"","stopRow":"","totalColumns":0,"cacheBlocks":true,"families":{"data":[]},"maxVersions":1,"caching":-1}))
from 137.72.240.180:54843: output error
2011-09-26 15:50:28,873 WARN org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 4 on 60020 caught:
java.nio.channels.ClosedChannelException
	at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133)
	at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
	at org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1501)
	at org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:876)
	at org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:955)
	at org.apache.hadoop.hbase.ipc.HBaseServer$Call.sendResponseIfReady(HBaseServer.java:390)
	at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1240)

What does this responseTooSlow and "output error" mean? Any clues to
find whats causing this?

Thanks.


On Mon, Sep 26, 2011 at 2:26 PM, Mayuresh <mayuresh.kshirsagar@gmail.com> wrote:
> Hi Ted,
>
> Yes I am aware that this isn't a good setup. I am working on
> understanding the coprocessors before I can use a bigger setup.
>
> I am talking about the prez at
> https://hbase.s3.amazonaws.com/hbase/HBase-CP-HUG10.pdf by Andrew
> Purtell, slide 5 :
>
> <quote>
> For long running jobs the client must periodically poll status to keep it
> alive; jobs without interest will be cancelled
> </quote>
>
>
> On Mon, Sep 26, 2011 at 2:08 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>> Looks like my response to user@ got bounced.
>>
>> Himanshu is continuing the work on HBASE-3607
>>
>>>> Currently all the regions are on a single region server.
>> I don't think the above is a good setup.
>>
>>>> It was mentioned to poll the job status periodically to avoid timeouts.
>> Can you tell us more about the presentation ? Normally you can include the
>> author in the cc list of this email chain.
>>
>> Cheers
>>
>> On Mon, Sep 26, 2011 at 12:44 AM, Mayuresh <mayuresh.kshirsagar@gmail.com>
>> wrote:
>>>
>>> Yes the computation takes longer than 60 secs. Will setting lease
>>> timeout in hbase-site.xml help?
>>>
>>> In one of the coprocessor presentations. It was mentioned to poll the
>>> job status periodically to avoid timeouts. Any idea how to do that?
>>>
>>> On Sep 23, 2011 9:59 PM, "Stack" <stack@duboce.net> wrote:
>>> > Is it doing a long GC? Is the aggregation taking longer than the
>>> > client 60 second timeout (You are setting lease below but I don't
>>> > think that will have an effect since it server-side)?
>>> >
>>> > St.Ack
>>> >
>>> >
>>> > On Fri, Sep 23, 2011 at 2:52 AM, Mayuresh
>>> > <mayuresh.kshirsagar@gmail.com> wrote:
>>> >> Hi,
>>> >>
>>> >> I am using the AggregationClient to perform aggregate calculation on
a
>>> >> 10 M row hbase table which is spread across around 27 regions.
>>> >> Currently all the regions are on a single region server.
>>> >>
>>> >> Here is briefly what the code is doing:
>>> >>         Leases leases = new Leases(40 * 60 * 1000, 5 * 1000);
>>> >>         leases.start();
>>> >>         long startTime = System.currentTimeMillis();
>>> >>         System.out.println("Start time: " + startTime);
>>> >>         Configuration conf = new Configuration(true);
>>> >>         AggregationClient aClient = new AggregationClient(conf);
>>> >>
>>> >>         Scan scan = new Scan();
>>> >>         scan.addColumn(TEST_FAMILY, TEST_QUALIFIER);
>>> >>         final ColumnInterpreter<Long, Long> ci = new
>>> >> LongColumnInterpreter();
>>> >>
>>> >>         double avg = aClient.avg(TEST_TABLE, ci,
>>> >>                 scan);
>>> >>         leases.close();
>>> >>         long endTime = System.currentTimeMillis();
>>> >>         System.out.println("End time: " + endTime);
>>> >>         System.out.println("Avg is: " + avg);
>>> >>         System.out.println("Time taken is: " + (endTime - startTime));
>>> >>
>>> >> However it is failing after some time with the following log messages
>>> >> in the region server log:
>>> >> 2011-09-23 14:21:25,649 DEBUG
>>> >> org.apache.hadoop.hbase.regionserver.HRegion: Updates disabled for
>>> >> region
>>> >> Test,serv3-1315575650141,1316682342728.b3fe66994b0f2fb0643c88f6897474a1.
>>> >> 2011-09-23 14:21:25,650 DEBUG
>>> >> org.apache.hadoop.hbase.io.hfile.HFileReaderV2: On close of file
>>> >> 15761056997240226 evicted 344 block(s)
>>> >> 2011-09-23 14:21:25,650 DEBUG
>>> >> org.apache.hadoop.hbase.regionserver.Store: closed data
>>> >> 2011-09-23 14:21:25,650 INFO
>>> >> org.apache.hadoop.hbase.regionserver.HRegion: Closed
>>> >>
>>> >> Test,serv3-1315575650141,1316682342728.b3fe66994b0f2fb0643c88f6897474a1.
>>> >> 2011-09-23 14:21:25,650 DEBUG
>>> >> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler:
>>> >> Closed region
>>> >> Test,serv3-1315575650141,1316682342728.b3fe66994b0f2fb0643c88f6897474a1.
>>> >> 2011-09-23 14:21:25,651 WARN org.apache.hadoop.ipc.HBaseServer: IPC
>>> >> Server Responder, call execCoprocessor([B@6b6ab732,
>>> >>
>>> >> getAvg(org.apache.hadoop.hbase.client.coprocessor.LongColumnInterpreter@2b216ab6,
>>> >>
>>> >> {"timeRange":[0,9223372036854775807],"batch":-1,"startRow":"serv1-1315571539741","stopRow":"serv5-1315575834916","totalColumns":0,"cacheBlocks":true,"families":{"data":[]},"maxVersions":1,"caching":-1}))
>>> >> from 137.72.240.177:47563: output error
>>> >> 2011-09-23 14:21:25,651 WARN org.apache.hadoop.ipc.HBaseServer: IPC
>>> >> Server handler 9 on 60020 caught:
>>> >> java.nio.channels.ClosedChannelException
>>> >>         at
>>> >> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133)
>>> >>         at
>>> >> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
>>> >>         at
>>> >> org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1501)
>>> >>         at
>>> >> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:876)
>>> >>         at
>>> >> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:955)
>>> >>         at
>>> >> org.apache.hadoop.hbase.ipc.HBaseServer$Call.sendResponseIfReady(HBaseServer.java:390)
>>> >>         at
>>> >> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1240)
>>> >>
>>> >> And the following in the zookeeper log:
>>> >> 2011-09-23 14:19:50,978 INFO
>>> >> org.apache.zookeeper.server.NIOServerCnxn: Established session
>>> >> 0x1329577cad70004 with negotiated timeout 180000 for client
>>> >> /137.72.240.177:45431
>>> >> 2011-09-23 14:21:11,028 WARN
>>> >> org.apache.zookeeper.server.NIOServerCnxn: EndOfStreamException:
>>> >> Unable to read additional data from client sessionid
>>> >> 0x1329577cad70004, likely client has closed socket
>>> >> 2011-09-23 14:21:11,029 INFO
>>> >> org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection
>>> >> for client /137.72.240.177:45431 which had sessionid 0x1329577cad70004
>>> >> 2011-09-23 14:24:42,000 INFO
>>> >> org.apache.zookeeper.server.ZooKeeperServer: Expiring session
>>> >> 0x1329577cad70004, timeout of 180000ms exceeded
>>> >> 2011-09-23 14:24:42,001 INFO
>>> >> org.apache.zookeeper.server.PrepRequestProcessor: Processed session
>>> >> termination for sessionid: 0x1329577cad70004
>>> >>
>>> >> Any ideas on how to resolve this?
>>> >>
>>> >> Thanks in advance.
>>> >>
>>> >> Regards,
>>> >> Mayuresh
>>> >>
>>
>>
>
>
>
> --
> -Mayuresh
>



-- 
-Mayuresh

Mime
View raw message