hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: Scanner TImeout Exception/Unknown Scanner Exception
Date Wed, 08 Apr 2009 12:22:28 GMT
Rakhi,

You won't see the datanodes being affected by these exceptions, only
the clients. Basically a datanode will keep a socket open for x time
then will close it. This is okay when you only use Hadoop because the
sockets are never left open for too long by the client. But, in HBase,
we do keep them open for a very long time.

J-D

On Wed, Apr 8, 2009 at 8:18 AM, Rakhi Khatwani <rakhi.khatwani@gmail.com> wrote:
> Hi J-D,
>            yes i did restart hbase after increasing the region-server lease
> timeout. initially i did set dfs.datanode.socket.write. timeout to 0. but it
> gave some problems on my local setup. i will try setting the
> dfs.datanode.socket.write to zero and test it again, if i face any issues, i
> will let you know.
>
> PS: I haven't seen dfs.datanode.socket.write.timeout property in
> hadoop-default.
>  & Even after the exception, All my datanodes and tasktrackers are live.
> none of them are dead.
>
>
> Thanks,
> Raakhi
>
> On Wed, Apr 8, 2009 at 5:38 PM, Jean-Daniel Cryans <jdcryans@apache.org>wrote:
>
>> Rakhi,
>>
>> Just to be sure, when you changed the RS lease timeout did you restart
>> hbase?
>>
>> The datanode logs seems to imply that some channels are left open for
>> too long. Please set dfs.datanode.socket.write.timeout to 0 in
>> hadoop-site.
>>
>> J-D
>>
>> On Wed, Apr 8, 2009 at 7:57 AM, Rakhi Khatwani <rakhi.khatwani@gmail.com>
>> wrote:
>> > Hi,
>> >     I came across Scanner Timeout Exception again :(
>> > this time i had a look at the tasktracker and datanode logs of the
>> machine
>> > where the task failed.
>> >
>> > The logs are as follows:
>> >
>> > TaskTracker:
>> >
>> > 2009-04-08 07:18:07,532 INFO org.apache.hadoop.mapred.TaskTracker:
>> > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
>> >
>> taskTracker/jobcache/job_200904080539_0001/attempt_200904080539_0001_m_000001_0/output/file.out
>> > in any of the configured local directories
>> > 2009-04-08 07:18:08,337 INFO org.apache.hadoop.mapred.TaskTracker:
>> > attempt_200904080539_0001_m_000001_0 0.0% Starting Analysis...
>> > 2009-04-08 07:18:12,565 INFO org.apache.hadoop.mapred.TaskTracker:
>> > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
>> >
>> taskTracker/jobcache/job_200904080539_0001/attempt_200904080539_0001_m_000001_0/output/file.out
>> > in any of the configured local directories
>> > 2009-04-08 07:18:14,399 INFO org.apache.hadoop.mapred.TaskTracker:
>> > attempt_200904080539_0001_m_000001_0 0.0% Starting Analysis...
>> > 2009-04-08 07:18:17,409 INFO org.apache.hadoop.mapred.TaskTracker:
>> > attempt_200904080539_0001_m_000001_0 0.0% Starting Analysis...
>> > 2009-04-08 07:18:17,583 INFO org.apache.hadoop.mapred.TaskTracker:
>> > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
>> >
>> taskTracker/jobcache/job_200904080539_0001/attempt_200904080539_0001_m_000001_0/output/file.out
>> > in any of the configured local directories
>> > 2009-04-08 07:18:19,763 INFO org.apache.hadoop.mapred.JvmManager: JVM :
>> > jvm_200904080539_0001_m_-1878302273 exited. Number of tasks it ran: 0
>> > 2009-04-08 07:18:22,587 INFO org.apache.hadoop.mapred.TaskTracker:
>> > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
>> >
>> taskTracker/jobcache/job_200904080539_0001/attempt_200904080539_0001_m_000001_0/output/file.out
>> > in any of the configured local directories
>> > 2009-04-08 07:18:22,779 INFO org.apache.hadoop.mapred.TaskRunner:
>> > attempt_200904080539_0001_m_000001_0 done; removing files.
>> > 2009-04-08 07:18:22,780 INFO org.apache.hadoop.mapred.TaskTracker:
>> > addFreeSlot : current free slots : 3
>> >
>> >
>> > At the Data Node:
>> >
>> >
>> >  2009-04-08 07:19:01,153 INFO
>> > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
>> > 10.251.74.84:50010, dest: /10.251.74.84:59583, bytes: 1320960, op:
>> > HDFS_READ, cliID: DFSClient_258286192, srvID:
>> > DS-2059868082-10.251.74.84-50010-1239116275760, blockid:
>> > blk_-4896946973674546604_2508
>> > 2009-04-08 07:19:01,154 WARN
>> > org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
>> > 10.251.74.84:50010,
>> > storageID=DS-2059868082-10.251.74.84-50010-1239116275760, infoPort=50075,
>> > ipcPort=50020):Got exception while serving blk_-4896946973674546604_2508
>> to
>> > /10.251.74.84:
>> > java.net.SocketTimeoutException: 480000 millis timeout while waiting for
>> > channel to be ready for write. ch :
>> > java.nio.channels.SocketChannel[connected local=/10.251.74.84:50010remote=/
>> > 10.251.74.84:59583]
>> >       at
>> >
>> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:185)
>> >       at
>> >
>> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
>> >       at
>> >
>> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
>> >       at
>> >
>> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:293)
>> >       at
>> >
>> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:387)
>> >       at
>> >
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:179)
>> >       at
>> >
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:94)
>> >       at java.lang.Thread.run(Thread.java:619)
>> >
>> >
>> > so ya ultimately it boils down to some problem with hdfs. but i am still
>> not
>> > able to figure out what the issue could be.
>> >
>> > Thanks
>> > Raakhi,
>> >
>> >
>> > On Wed, Apr 8, 2009 at 3:26 PM, Rakhi Khatwani <rakhi.khatwani@gmail.com
>> >wrote:
>> >
>> >> Hi,
>> >>    I am pasting the region server logs:
>> >>
>> >> 2009-04-08 00:06:26,378 INFO
>> >> org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
>> >> 5427021309867584920 lease expired
>> >> 2009-04-08 00:16:23,641 INFO
>> >> org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
>> >> -3894991203345155244 lease expired
>> >> 2009-04-08 00:29:08,402 INFO
>> >> org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
>> >> 651295424715622118 lease expired
>> >> 2009-04-08 00:39:05,430 INFO
>> >> org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
>> >> -2734117247134548430 lease expired
>> >> 2009-04-08 00:46:35,515 INFO
>> >> org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
>> >> 2810685965461882801 lease expired
>> >> 2009-04-08 00:56:38,289 INFO
>> >> org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
>> >> 9085655909080042643 lease expired
>> >> 2009-04-08 01:06:36,035 INFO
>> >> org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
>> >> 5701864683466148562 lease expired
>> >> 2009-04-08 03:13:02,545 INFO
>> >> org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
>> >> -2157771707879192919 lease expired
>> >> 2009-04-08 03:29:24,603 ERROR
>> >> org.apache.hadoop.hbase.regionserver.HRegionServer:
>> >> org.apache.hadoop.hbase.UnknownScannerException: Name:
>> -2157771707879192919
>> >> 2009-04-08 03:29:24,606 INFO org.apache.hadoop.ipc.HBaseServer: IPC
>> Server
>> >> handler 0 on 60020, call next(-2157771707879192919, 30) from
>> >> 10.250.6.4:37602: error:
>> org.apache.hadoop.hbase.UnknownScannerException:
>> >> Name: -2157771707879192919
>> >> org.apache.hadoop.hbase.UnknownScannerException: Name:
>> -2157771707879192919
>> >>        at
>> >>
>> org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1568)
>> >>        at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
>> >>        at
>> >>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> >>        at java.lang.reflect.Method.invoke(Method.java:597)
>> >>        at
>> >> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
>> >>        at
>> >>
>> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
>> >> 2009-04-08 03:29:24,655 ERROR
>> >> org.apache.hadoop.hbase.regionserver.HRegionServer: org.apache.hadoop.h
>> Sent at 15:17 on Wednesday
>> >>
>> >> What i believe is at the region server... scanner lease expires for
>> scanner
>> >> id SCANNER_ID [which happens @ 3:13]
>> >> and then my map reduce program is calling a next() which takes this
>> >> SCANNER_ID and hence we get this scanner timeout exception/unknown
>> scanner
>> >> exception. [this happens at 3:24]
>> >>
>> >> How do i avoid succha situation?
>> >>
>> >> Thanks,
>> >> Raakhi
>> >>
>> >>
>> >>
>> >> On Wed, Apr 8, 2009 at 2:03 PM, Rakhi Khatwani <
>> rakhi.khatwani@gmail.com>wrote:
>> >>
>> >>> Hi,
>> >>>       I am using hbase-0.19 on 20 node ec2 cluster.
>> >>>      I have a map-reduce program which performs some analysis on
each
>> row.
>> >>> when i process about 17k rows in ec2 cluster, after performing 65%,
my
>> job
>> >>> fails
>> >>> after going through the logs, in the UI we found out that the job
>> failed
>> >>> because of a Scanner Timeout Exception.
>> >>>
>> >>> My map function reads data from one table 'table1' performs analysis,
>> if
>> >>> analysis is completed, i mark the status of the row to 'analyzed'
>> (table1
>> >>> has a column-family called status). and i write the result of the
>> analyzed
>> >>> data into table2. (All this happens in my map function. i have no
>> reduce for
>> >>> this).
>> >>>
>> >>> i did go through the archives.. where someone mentioned to increase
the
>> >>> reagion lease period. so i increased the lease period to 360000 ms (1
>> hour).
>> >>> despite that i came across Scanner Timeout Exception.
>> >>>
>> >>> Your help will be greatly appreciated as this scanner timeout exception
>> is
>> >>> a blocker to my application.
>> >>>
>> >>> Thanks,
>> >>> Raakhi
>> >>>
>> >>>
>> >>>
>> >>>
>> >>
>> >>
>> >
>>
>

Mime
View raw message