accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Newton (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (ACCUMULO-2466) Bulk randomwalk fails with bad key
Date Thu, 13 Mar 2014 18:33:43 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13933841#comment-13933841
] 

Eric Newton edited comment on ACCUMULO-2466 at 3/13/14 6:33 PM:
----------------------------------------------------------------

The file system is probably being closed by the really annoying shutdown hook.

You may need to trace through the METADATA table updates by decoding write-ahead logs to find
out what happened.  It can take days.



was (Author: ecn):
The file system is probably being closed by the really annoying shutdown hook.

You may need to trace through the METADATA table updates to find out what happened.  It can
take days.


> Bulk randomwalk fails with bad key
> ----------------------------------
>
>                 Key: ACCUMULO-2466
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-2466
>             Project: Accumulo
>          Issue Type: Bug
>          Components: master, test
>    Affects Versions: 1.4.4
>            Reporter: Bill Havanki
>              Labels: import, randomwalk, test
>
> Running bulk randomwalk against 1.4.5-SNAPSHOT, got this in verification:
> {noformat}
> Caused by: java.lang.Exception: Bad key at r00000 cf:000 [] 1394658887772 false 1
>         at org.apache.accumulo.server.test.randomwalk.bulk.Verify.visit(Verify.java:65)
> {noformat}
> Possible reasons:
> * ACCUMULO-2110, not backported to 1.4 or 1.5
> * master agitation
> I see in the logs three internal errors from imports that failed due to the masters being
restarted. The failure timing is around 5 seconds after the masters restart. Example:
> {noformat}
> 12 14:10:17,580 [bulk.BulkMinusOne] ERROR: org.apache.accumulo.core.client.AccumuloException:
Intern
> al error processing waitForTableOperation
> org.apache.accumulo.core.client.AccumuloException: Internal error processing waitForTableOperation
>         at org.apache.accumulo.core.client.admin.TableOperationsImpl.doTableOperation(TableOperation
> sImpl.java:290)
>         at org.apache.accumulo.core.client.admin.TableOperationsImpl.doTableOperation(TableOperation
> sImpl.java:258)
>         at org.apache.accumulo.core.client.admin.TableOperationsImpl.importDirectory(TableOperations
> Impl.java:947)
>         at org.apache.accumulo.server.test.randomwalk.bulk.BulkPlusOne.bulkLoadLots(BulkPlusOne.java
> :99)
>         at org.apache.accumulo.server.test.randomwalk.bulk.BulkMinusOne.runLater(BulkMinusOne.java:2
> 9)
> ...
> Caused by: org.apache.thrift.TApplicationException: Internal error processing waitForTableOperation
> {noformat}
> Two BulkMinusOne and one BulkPlusOne failed, which may be why the offending row was at
value 1.
> The {{TableOperationsImpl.waitForTableOperation}} method does not catch {{TApplicationException}},
so the imports fail.
> I see lots of previous work on this sort of error in ACCUMULO-334 and ACCUMULO-2110.
If anyone has troubleshooting tips I'd be happy to hear them.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message