atlas-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sarath Subramanian (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (ATLAS-1720) Add titan storage.lock.wait-time for Berkley DB to fix intermittent IT failures
Date Wed, 05 Apr 2017 07:12:41 GMT

     [ https://issues.apache.org/jira/browse/ATLAS-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sarath Subramanian updated ATLAS-1720:
--------------------------------------
    Summary: Add titan storage.lock.wait-time for Berkley DB to fix intermittent IT failures
  (was: Increase titan storage.lock.wait-time for Berkley DB to fix intermittent IT failures
)

> Add titan storage.lock.wait-time for Berkley DB to fix intermittent IT failures 
> --------------------------------------------------------------------------------
>
>                 Key: ATLAS-1720
>                 URL: https://issues.apache.org/jira/browse/ATLAS-1720
>             Project: Atlas
>          Issue Type: Bug
>          Components:  atlas-core
>    Affects Versions: trunk, 0.9-incubating
>            Reporter: Sarath Subramanian
>            Assignee: Sarath Subramanian
>
> Some of the ITs in Atlas fail intermittently with exception - "Could not execute operation
due to backend exception"
> Upon investigation it's found this is due to Berkley LockTimeoutException (https://github.com/thinkaurelius/titan/issues/1113)
> The default LockTimeout for berkley db is 500 ms and if a thread (some IT) is waiting
on titan storage resource which is locked by another thread and it doesn't releases the lock
within 500ms - fails with above exception. (see error log below)
> The fix for this is to increase the storage.lock.wait-time for berkley db to 10000 ms.
This is consistent with the lock wait timeout specified for HBase.
> {code}
> Caused by: com.sleepycat.je.LockTimeoutException: (JE 5.0.73) Lock expired. Locker 1516581475
7535_NotificationHookConsumer thread-0_Txn: waited for lock on database=edgestore LockAddr:284896285
LSN=0x0/0x21d55f type=WRITE grant=WAIT_PROMOTION timeoutMillis=500 startTime=1491261268442
endTime=1491261268942
> Owners: [<LockInfo locker="1445928922 7537_qtp184901207-1038 - e015a355-d6c5-4424-b7a7-833a289aea9d_Txn"
type="READ"/>, <LockInfo locker="1516581475 7535_NotificationHookConsumer thread-0_Txn"
type="READ"/>]
> Waiters: []
> Transaction 1445928922 7537_qtp184901207-1038 - e015a355-d6c5-4424-b7a7-833a289aea9d_Txn
waits for  LockAddr:471572402 Owners:<LockInfo locker="1516581475 7535_NotificationHookConsumer
thread-0_Txn" type="WRITE"/> Waiters:[<LockInfo locker="1445928922 7537_qtp184901207-1038
- e015a355-d6c5-4424-b7a7-833a289aea9d_Txn" type="READ"/>]
> Transaction 1516581475 7535_NotificationHookConsumer thread-0_Txn owns LockAddr:471572402
<LockInfo locker="1516581475 7535_NotificationHookConsumer thread-0_Txn" type="WRITE"/>
> Transaction 1516581475 7535_NotificationHookConsumer thread-0_Txn waits for LockAddr:284896285
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message