hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rakhi Khatwani <rakhi.khatw...@gmail.com>
Subject Re: Scanner TImeout Exception/Unknown Scanner Exception
Date Wed, 08 Apr 2009 11:57:08 GMT
Hi,
     I came across Scanner Timeout Exception again :(
this time i had a look at the tasktracker and datanode logs of the machine
where the task failed.

The logs are as follows:

TaskTracker:

2009-04-08 07:18:07,532 INFO org.apache.hadoop.mapred.TaskTracker:
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
taskTracker/jobcache/job_200904080539_0001/attempt_200904080539_0001_m_000001_0/output/file.out
in any of the configured local directories
2009-04-08 07:18:08,337 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_200904080539_0001_m_000001_0 0.0% Starting Analysis...
2009-04-08 07:18:12,565 INFO org.apache.hadoop.mapred.TaskTracker:
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
taskTracker/jobcache/job_200904080539_0001/attempt_200904080539_0001_m_000001_0/output/file.out
in any of the configured local directories
2009-04-08 07:18:14,399 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_200904080539_0001_m_000001_0 0.0% Starting Analysis...
2009-04-08 07:18:17,409 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_200904080539_0001_m_000001_0 0.0% Starting Analysis...
2009-04-08 07:18:17,583 INFO org.apache.hadoop.mapred.TaskTracker:
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
taskTracker/jobcache/job_200904080539_0001/attempt_200904080539_0001_m_000001_0/output/file.out
in any of the configured local directories
2009-04-08 07:18:19,763 INFO org.apache.hadoop.mapred.JvmManager: JVM :
jvm_200904080539_0001_m_-1878302273 exited. Number of tasks it ran: 0
2009-04-08 07:18:22,587 INFO org.apache.hadoop.mapred.TaskTracker:
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
taskTracker/jobcache/job_200904080539_0001/attempt_200904080539_0001_m_000001_0/output/file.out
in any of the configured local directories
2009-04-08 07:18:22,779 INFO org.apache.hadoop.mapred.TaskRunner:
attempt_200904080539_0001_m_000001_0 done; removing files.
2009-04-08 07:18:22,780 INFO org.apache.hadoop.mapred.TaskTracker:
addFreeSlot : current free slots : 3


At the Data Node:


 2009-04-08 07:19:01,153 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
10.251.74.84:50010, dest: /10.251.74.84:59583, bytes: 1320960, op:
HDFS_READ, cliID: DFSClient_258286192, srvID:
DS-2059868082-10.251.74.84-50010-1239116275760, blockid:
blk_-4896946973674546604_2508
2009-04-08 07:19:01,154 WARN
org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
10.251.74.84:50010,
storageID=DS-2059868082-10.251.74.84-50010-1239116275760, infoPort=50075,
ipcPort=50020):Got exception while serving blk_-4896946973674546604_2508 to
/10.251.74.84:
java.net.SocketTimeoutException: 480000 millis timeout while waiting for
channel to be ready for write. ch :
java.nio.channels.SocketChannel[connected local=/10.251.74.84:50010 remote=/
10.251.74.84:59583]
       at
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:185)
       at
org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
       at
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
       at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:293)
       at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:387)
       at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:179)
       at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:94)
       at java.lang.Thread.run(Thread.java:619)


so ya ultimately it boils down to some problem with hdfs. but i am still not
able to figure out what the issue could be.

Thanks
Raakhi,


On Wed, Apr 8, 2009 at 3:26 PM, Rakhi Khatwani <rakhi.khatwani@gmail.com>wrote:

> Hi,
>    I am pasting the region server logs:
>
> 2009-04-08 00:06:26,378 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
> 5427021309867584920 lease expired
> 2009-04-08 00:16:23,641 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
> -3894991203345155244 lease expired
> 2009-04-08 00:29:08,402 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
> 651295424715622118 lease expired
> 2009-04-08 00:39:05,430 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
> -2734117247134548430 lease expired
> 2009-04-08 00:46:35,515 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
> 2810685965461882801 lease expired
> 2009-04-08 00:56:38,289 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
> 9085655909080042643 lease expired
> 2009-04-08 01:06:36,035 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
> 5701864683466148562 lease expired
> 2009-04-08 03:13:02,545 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
> -2157771707879192919 lease expired
> 2009-04-08 03:29:24,603 ERROR
> org.apache.hadoop.hbase.regionserver.HRegionServer:
> org.apache.hadoop.hbase.UnknownScannerException: Name: -2157771707879192919
> 2009-04-08 03:29:24,606 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 0 on 60020, call next(-2157771707879192919, 30) from
> 10.250.6.4:37602: error: org.apache.hadoop.hbase.UnknownScannerException:
> Name: -2157771707879192919
> org.apache.hadoop.hbase.UnknownScannerException: Name: -2157771707879192919
>        at
> org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1568)
>        at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at
> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
>        at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
> 2009-04-08 03:29:24,655 ERROR
> org.apache.hadoop.hbase.regionserver.HRegionServer: org.apache.hadoop.h Sent at 15:17
on Wednesday
>
> What i believe is at the region server... scanner lease expires for scanner
> id SCANNER_ID [which happens @ 3:13]
> and then my map reduce program is calling a next() which takes this
> SCANNER_ID and hence we get this scanner timeout exception/unknown scanner
> exception. [this happens at 3:24]
>
> How do i avoid succha situation?
>
> Thanks,
> Raakhi
>
>
>
> On Wed, Apr 8, 2009 at 2:03 PM, Rakhi Khatwani <rakhi.khatwani@gmail.com>wrote:
>
>> Hi,
>>       I am using hbase-0.19 on 20 node ec2 cluster.
>>      I have a map-reduce program which performs some analysis on each row.
>> when i process about 17k rows in ec2 cluster, after performing 65%, my job
>> fails
>> after going through the logs, in the UI we found out that the job failed
>> because of a Scanner Timeout Exception.
>>
>> My map function reads data from one table 'table1' performs analysis, if
>> analysis is completed, i mark the status of the row to 'analyzed' (table1
>> has a column-family called status). and i write the result of the analyzed
>> data into table2. (All this happens in my map function. i have no reduce for
>> this).
>>
>> i did go through the archives.. where someone mentioned to increase the
>> reagion lease period. so i increased the lease period to 360000 ms (1 hour).
>> despite that i came across Scanner Timeout Exception.
>>
>> Your help will be greatly appreciated as this scanner timeout exception is
>> a blocker to my application.
>>
>> Thanks,
>> Raakhi
>>
>>
>>
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message