hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Daniel Cryans (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-7455) Increase timeouts in TestReplication and TestSplitLogWorker
Date Fri, 04 Jan 2013 19:40:12 GMT

    [ https://issues.apache.org/jira/browse/HBASE-7455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13544167#comment-13544167

Jean-Daniel Cryans commented on HBASE-7455:

I'm trying to investigate right now what the other problems are with TestReplication, right
now I'm getting this weird case that kills a RS:

2013-01-04 10:04:45,500 WARN  [IPC Server handler 8 on 57099] namenode.FSDirectory(422): DIR*
FSDirectory.unprotectedRenameTo: failed to rename /user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/.tmp/b938b33268064312abfc250d2eeca61d
to /user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/f/b938b33268064312abfc250d2eeca61d
because destination's parent does not exist
2013-01-04 10:04:45,503 WARN  [RegionServer:1;,57114,1357322589018.cacheFlusher]
regionserver.Store(847): Unable to rename hdfs://localhost:57099/user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/.tmp/b938b33268064312abfc250d2eeca61d
to hdfs://localhost:57099/user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/f/b938b33268064312abfc250d2eeca61d
2013-01-04 10:04:45,504 WARN  [DataStreamer for file /user/jdcryans/hbase/.logs/,57113,1357322588994/]
hdfs.DFSClient$DFSOutputStream$DataStreamer(2873): DataStreamer Exception: org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /user/jdcryans/hbase/.logs/,57113,1357322588994/
File does not exist. [Lease.  Holder: DFSClient_hb_rs_172.21.3.117,57113,1357322588994, pendingcreates:
> Increase timeouts in TestReplication and TestSplitLogWorker
> -----------------------------------------------------------
>                 Key: HBASE-7455
>                 URL: https://issues.apache.org/jira/browse/HBASE-7455
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.96.0, 0.94.4
>         Attachments: 7455-0.94.txt, 7455-0.96.txt
> When I measure the times in TestReplication.queueFailover, it takes about 15s on my (reasonably
fast) Laptop.
> The timeout in queueFailover currently is 1500*2*15 = 45000ms.
> For setup before each test (which truncates the table and waits for the changes to replicate)
it is 1500*15 = 22500ms.
> Interestingly I see queueFailover failures where the wait time is measured as 64260ms
and some at 72316ms.
> Since these numbers are not even close to 45000ms, the machine or JVM must have been
stuck for 15 or almost 30s (otherwise we'd get a timeout and the total time spent should be
close to the timeout).
> So I would suggest that we increase the timeouts further.
> We could set SLEEP_TIME to 2000 and retries to 20. Would lead to 2000*2*20 = 80000ms.
> Any objections?

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message