accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christopher Tubbs (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-4398) Possible for client to see TableNotFoundException adding splits on a newly created table
Date Thu, 04 Aug 2016 17:17:20 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-4398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15408157#comment-15408157
] 

Christopher Tubbs commented on ACCUMULO-4398:
---------------------------------------------

In some cases, though I'm not sure all, when a server is receives a table operation and the
table doesn't exist, it will clear its ZooCache and check again. However, it looks like this
isn't good enough, as it may still get old data. It looks like what we should be doing is
a sync() after we clear the cache: https://zookeeper.apache.org/doc/r3.1.2/zookeeperProgrammers.html#N1043A

> Possible for client to see TableNotFoundException adding splits on a newly created table
> ----------------------------------------------------------------------------------------
>
>                 Key: ACCUMULO-4398
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4398
>             Project: Accumulo
>          Issue Type: Bug
>          Components: client, zookeeper
>    Affects Versions: 1.7.1
>            Reporter: Josh Elser
>
> Just came across a really odd scenario. I believe that it's a race condition in the client
that stems from our beloved {{ZooCache}}.
> This was observed via a test failure in {{LogicalTimeIT}}:
> {noformat}
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 29.249 sec <<<
FAILURE! - in org.apache.accumulo.test.functional.LogicalTimeIT
> run(org.apache.accumulo.test.functional.LogicalTimeIT)  Time elapsed: 29.037 sec  <<<
ERROR!
> org.apache.accumulo.core.client.TableNotFoundException: Table LogicalTimeIT_run06 does
not exist
> 	at org.apache.accumulo.core.client.impl.Tables._getTableId(Tables.java:117)
> 	at org.apache.accumulo.core.client.impl.Tables.getTableId(Tables.java:102)
> 	at org.apache.accumulo.core.client.impl.TableOperationsImpl.addSplits(TableOperationsImpl.java:374)
> 	at org.apache.accumulo.test.functional.LogicalTimeIT.runMergeTest(LogicalTimeIT.java:81)
> 	at org.apache.accumulo.test.functional.LogicalTimeIT.run(LogicalTimeIT.java:56)
> {noformat}
> Ultimately:
> {code}
>     conn.tableOperations().create(table, new NewTableConfiguration().setTimeType(TimeType.LOGICAL));
>     TreeSet<Text> splitSet = new TreeSet<Text>();
>     for (String split : splits) {
>       splitSet.add(new Text(split));
>     }
>     conn.tableOperations().addSplits(table, splitSet);
> {code}
> The important piece to remember is that a ZooKeeper client, when a watcher is set, will
eventually get all updates from that watcher in the order which they occurred. LogicalTimeIT
is repeatedly running the same test over tables of varying characteristics. I think these
are the important points.
> Consider the following:
> # Client creates a table T1
> # ZooCache is cleared after FATE op finishes
> # Watcher is set on ZTABLES in ZK
> # Client interacts with T1
> # Client creates T2
> # ZooCache is cleared after FATE op finishes
> # Watcher fires on ZTABLES node in ZK (CHILDREN_CHANGED) and repopulates the child list
on the ZTABLES node
> # Client makes call to split T2
> # Code will check if the table exists, but the childrenCache will be repopulated in ZooCache
which will cause the client to think the table doesn't exit
> # Eventually, the watcher would fire and ZTABLES would be updated and everything is fine.
> I believe this is a plausible scenario, however perhaps unlikely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message