hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: Coprocessor execution - HBaseServer: (responseTooSlow) java.nio.channels.ClosedChannelException
Date Mon, 26 Sep 2011 16:25:43 GMT
responseTooSlow is a new feature in TRUNK.  Slow responses get logged
with detail.  I suppose your long-running CP qualifies.  We should
make it so there can be exceptions.   The ClosedChannelException
usually means the client has given up waiting, it timedout.   Try
setting "ipc.socket.timeout" to something large.

St.Ack

On Mon, Sep 26, 2011 at 3:22 AM, Mayuresh <mayuresh.kshirsagar@gmail.com> wrote:
> I increased the leases in hbase-site.xml to around 50 minutes:
>
>  <property>
>    <name>hbase.regionserver.lease.period</name>
>    <!--value>60000</value-->
>    <value>3000000</value>
>    <description>HRegion server lease period in milliseconds. Default is
>    60 seconds. Clients must report in within this period else they are
>    considered dead.</description>
>  </property>
>
> However I still fail with the same error:
>
> 2011-09-26 15:50:28,857 WARN org.apache.hadoop.ipc.HBaseServer:
> (responseTooSlow):
> {"processingtimems":118696,"call":"execCoprocessor([B@5b0e6f59,
> getAvg(org.apache.hadoop.hbase.client.coprocessor.LongColumnInterpreter@10b06ac3,
> {\"timeRange\":[0,9223372036854775807],\"batch\":-1,\"startRow\":\"\",\"stopRow\":\"\",\"totalColumns\":0,\"cacheBlocks\":true,\"families\":{\"data\":[]},\"maxVersions\":1,\"caching\":-1}))","client":"137.72.240.180:54843","starttimems":1317032310155,"queuetimems":9,"class":"HRegionServer","responsesize":0,"method":"execCoprocessor"}
> 2011-09-26 15:50:28,872 WARN org.apache.hadoop.ipc.HBaseServer: IPC
> Server Responder, call execCoprocessor([B@5b0e6f59,
> getAvg(org.apache.hadoop.hbase.client.coprocessor.LongColumnInterpreter@10b06ac3,
> {"timeRange":[0,9223372036854775807],"batch":-1,"startRow":"","stopRow":"","totalColumns":0,"cacheBlocks":true,"families":{"data":[]},"maxVersions":1,"caching":-1}))
> from 137.72.240.180:54843: output error
> 2011-09-26 15:50:28,873 WARN org.apache.hadoop.ipc.HBaseServer: IPC
> Server handler 4 on 60020 caught:
> java.nio.channels.ClosedChannelException
>        at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133)
>        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
>        at org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1501)
>        at org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:876)
>        at org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:955)
>        at org.apache.hadoop.hbase.ipc.HBaseServer$Call.sendResponseIfReady(HBaseServer.java:390)
>        at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1240)
>
> What does this responseTooSlow and "output error" mean? Any clues to
> find whats causing this?
>
> Thanks.
>
>
> On Mon, Sep 26, 2011 at 2:26 PM, Mayuresh <mayuresh.kshirsagar@gmail.com> wrote:
>> Hi Ted,
>>
>> Yes I am aware that this isn't a good setup. I am working on
>> understanding the coprocessors before I can use a bigger setup.
>>
>> I am talking about the prez at
>> https://hbase.s3.amazonaws.com/hbase/HBase-CP-HUG10.pdf by Andrew
>> Purtell, slide 5 :
>>
>> <quote>
>> For long running jobs the client must periodically poll status to keep it
>> alive; jobs without interest will be cancelled
>> </quote>
>>
>>
>> On Mon, Sep 26, 2011 at 2:08 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>>> Looks like my response to user@ got bounced.
>>>
>>> Himanshu is continuing the work on HBASE-3607
>>>
>>>>> Currently all the regions are on a single region server.
>>> I don't think the above is a good setup.
>>>
>>>>> It was mentioned to poll the job status periodically to avoid timeouts.
>>> Can you tell us more about the presentation ? Normally you can include the
>>> author in the cc list of this email chain.
>>>
>>> Cheers
>>>
>>> On Mon, Sep 26, 2011 at 12:44 AM, Mayuresh <mayuresh.kshirsagar@gmail.com>
>>> wrote:
>>>>
>>>> Yes the computation takes longer than 60 secs. Will setting lease
>>>> timeout in hbase-site.xml help?
>>>>
>>>> In one of the coprocessor presentations. It was mentioned to poll the
>>>> job status periodically to avoid timeouts. Any idea how to do that?
>>>>
>>>> On Sep 23, 2011 9:59 PM, "Stack" <stack@duboce.net> wrote:
>>>> > Is it doing a long GC? Is the aggregation taking longer than the
>>>> > client 60 second timeout (You are setting lease below but I don't
>>>> > think that will have an effect since it server-side)?
>>>> >
>>>> > St.Ack
>>>> >
>>>> >
>>>> > On Fri, Sep 23, 2011 at 2:52 AM, Mayuresh
>>>> > <mayuresh.kshirsagar@gmail.com> wrote:
>>>> >> Hi,
>>>> >>
>>>> >> I am using the AggregationClient to perform aggregate calculation
on a
>>>> >> 10 M row hbase table which is spread across around 27 regions.
>>>> >> Currently all the regions are on a single region server.
>>>> >>
>>>> >> Here is briefly what the code is doing:
>>>> >>         Leases leases = new Leases(40 * 60 * 1000, 5 * 1000);
>>>> >>         leases.start();
>>>> >>         long startTime = System.currentTimeMillis();
>>>> >>         System.out.println("Start time: " + startTime);
>>>> >>         Configuration conf = new Configuration(true);
>>>> >>         AggregationClient aClient = new AggregationClient(conf);
>>>> >>
>>>> >>         Scan scan = new Scan();
>>>> >>         scan.addColumn(TEST_FAMILY, TEST_QUALIFIER);
>>>> >>         final ColumnInterpreter<Long, Long> ci = new
>>>> >> LongColumnInterpreter();
>>>> >>
>>>> >>         double avg = aClient.avg(TEST_TABLE, ci,
>>>> >>                 scan);
>>>> >>         leases.close();
>>>> >>         long endTime = System.currentTimeMillis();
>>>> >>         System.out.println("End time: " + endTime);
>>>> >>         System.out.println("Avg is: " + avg);
>>>> >>         System.out.println("Time taken is: " + (endTime -
startTime));
>>>> >>
>>>> >> However it is failing after some time with the following log messages
>>>> >> in the region server log:
>>>> >> 2011-09-23 14:21:25,649 DEBUG
>>>> >> org.apache.hadoop.hbase.regionserver.HRegion: Updates disabled for
>>>> >> region
>>>> >> Test,serv3-1315575650141,1316682342728.b3fe66994b0f2fb0643c88f6897474a1.
>>>> >> 2011-09-23 14:21:25,650 DEBUG
>>>> >> org.apache.hadoop.hbase.io.hfile.HFileReaderV2: On close of file
>>>> >> 15761056997240226 evicted 344 block(s)
>>>> >> 2011-09-23 14:21:25,650 DEBUG
>>>> >> org.apache.hadoop.hbase.regionserver.Store: closed data
>>>> >> 2011-09-23 14:21:25,650 INFO
>>>> >> org.apache.hadoop.hbase.regionserver.HRegion: Closed
>>>> >>
>>>> >> Test,serv3-1315575650141,1316682342728.b3fe66994b0f2fb0643c88f6897474a1.
>>>> >> 2011-09-23 14:21:25,650 DEBUG
>>>> >> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler:
>>>> >> Closed region
>>>> >> Test,serv3-1315575650141,1316682342728.b3fe66994b0f2fb0643c88f6897474a1.
>>>> >> 2011-09-23 14:21:25,651 WARN org.apache.hadoop.ipc.HBaseServer:
IPC
>>>> >> Server Responder, call execCoprocessor([B@6b6ab732,
>>>> >>
>>>> >> getAvg(org.apache.hadoop.hbase.client.coprocessor.LongColumnInterpreter@2b216ab6,
>>>> >>
>>>> >> {"timeRange":[0,9223372036854775807],"batch":-1,"startRow":"serv1-1315571539741","stopRow":"serv5-1315575834916","totalColumns":0,"cacheBlocks":true,"families":{"data":[]},"maxVersions":1,"caching":-1}))
>>>> >> from 137.72.240.177:47563: output error
>>>> >> 2011-09-23 14:21:25,651 WARN org.apache.hadoop.ipc.HBaseServer:
IPC
>>>> >> Server handler 9 on 60020 caught:
>>>> >> java.nio.channels.ClosedChannelException
>>>> >>         at
>>>> >> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133)
>>>> >>         at
>>>> >> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
>>>> >>         at
>>>> >> org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1501)
>>>> >>         at
>>>> >> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:876)
>>>> >>         at
>>>> >> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:955)
>>>> >>         at
>>>> >> org.apache.hadoop.hbase.ipc.HBaseServer$Call.sendResponseIfReady(HBaseServer.java:390)
>>>> >>         at
>>>> >> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1240)
>>>> >>
>>>> >> And the following in the zookeeper log:
>>>> >> 2011-09-23 14:19:50,978 INFO
>>>> >> org.apache.zookeeper.server.NIOServerCnxn: Established session
>>>> >> 0x1329577cad70004 with negotiated timeout 180000 for client
>>>> >> /137.72.240.177:45431
>>>> >> 2011-09-23 14:21:11,028 WARN
>>>> >> org.apache.zookeeper.server.NIOServerCnxn: EndOfStreamException:
>>>> >> Unable to read additional data from client sessionid
>>>> >> 0x1329577cad70004, likely client has closed socket
>>>> >> 2011-09-23 14:21:11,029 INFO
>>>> >> org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection
>>>> >> for client /137.72.240.177:45431 which had sessionid 0x1329577cad70004
>>>> >> 2011-09-23 14:24:42,000 INFO
>>>> >> org.apache.zookeeper.server.ZooKeeperServer: Expiring session
>>>> >> 0x1329577cad70004, timeout of 180000ms exceeded
>>>> >> 2011-09-23 14:24:42,001 INFO
>>>> >> org.apache.zookeeper.server.PrepRequestProcessor: Processed session
>>>> >> termination for sessionid: 0x1329577cad70004
>>>> >>
>>>> >> Any ideas on how to resolve this?
>>>> >>
>>>> >> Thanks in advance.
>>>> >>
>>>> >> Regards,
>>>> >> Mayuresh
>>>> >>
>>>
>>>
>>
>>
>>
>> --
>> -Mayuresh
>>
>
>
>
> --
> -Mayuresh
>

Mime
View raw message