accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Newton (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-3774) Deadlock after recovering root tablet
Date Tue, 12 May 2015 18:14:00 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-3774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14540381#comment-14540381
] 

Eric Newton commented on ACCUMULO-3774:
---------------------------------------

The problem (as I'm working through it):

# at some point a tserver figures out that a WALog is no longer in use:
* it is closed
* no tablet has a references to it
* the log is marked as unused to signal to the GC that the file can be removed
# one of the likely times a log goes unused, is during a shutdown
# at that point (but there are others) the metadata table is not available

Some ideas:
* mark the log *eventually* asynchronously
* keep the log references in zookeeper only
* use HDFS to track the logs, avoid the metadata table

So, can zookeeper handle this?
* a tablet server can theoretically ingest as fast as the WAL can write, which is the speed
of HDFS
* let's be an optimist and use a write rate of 128M/s
* this allows for a log roll every 8 seconds on our magic hardware
* let's assume a big cluster size: 10K nodes
* each rollover updates zookeeper 3x: create, unused, deleted
* 10K*3/8 = 3750 updates / sec
* assuming 100 WALogs per server, and 500 bytes for each entry: 0.5G
* based on published numbers (I found [this|http://wiki.apache.org/hadoop/ZooKeeper/Performance]),
it looks like this would be easily do-able, even on quorum large enough to handle this magic
cluster
* I'm pretty sure the NN (or several of them) can keep up, since they are handling the WALs
creation/deletion anyhow

What are the benefits to each possible solution?
Async:
* async updates would require a recovery, probably for every "clean" shutdown
Zookeeper:
* we'll have to sync to avoid propagation delays (like knowing the full log set before recovery
starts)
* probably some other problems, zookeeper doesn't always act like I think it should
NN:
* using the NN to track files was not a particularly scalable approach, but it would be easy,
and the multi-volume support allows us to abuse the NN a little more
* given the current logging area {{/accumulo/walogs/host+port/UUID}} we would need some way
for a restarted server to begin logging before all recovery has completed

At this point, I'm leaning towards marking logs as unused by renaming them in HDFS, and storing
them under the host+port+session. This eliminates the entry for the log from the metadata
table altogether.


> Deadlock after recovering root tablet
> -------------------------------------
>
>                 Key: ACCUMULO-3774
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3774
>             Project: Accumulo
>          Issue Type: Bug
>         Environment: Hadoop 2.7.0, ZK 3.4.6, Accumulo 83d1b8388ad807d678c9a3a922e5025faa9a5933,
20 node m3.large EC2 cluster
>            Reporter: Keith Turner
>            Assignee: Eric Newton
>            Priority: Blocker
>              Labels: 1.7.0_QA
>             Fix For: 1.8.0
>
>         Attachments: ACCUMULO-3774-01.patch
>
>
> I started CI running against 1.7.0-SNAP.   After CI ran for while I started agitation.
  Then everything froze up.   The root tablet node was killed, the root tablet had a lot of
walogs (will open a seperate issue for this), the root tablet was reloaded on another machine.
 However it hung up while loading with the following issue.  The minor compaction after recovery
was trying to write to the root tablet.  This happened before the root tablet location was
set.
> {noformat}
> "Minor compacting +r<<" daemon prio=10 tid=0x00000000046cd800 nid=0x3508 in Object.wait()
[0x00007fb0ac3b1000]
>    java.lang.Thread.State: WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         at java.lang.Object.wait(Object.java:503)
>         at org.apache.accumulo.core.client.impl.TabletServerBatchWriter.waitRTE(TabletServerBatchWriter.java:459)
>         at org.apache.accumulo.core.client.impl.TabletServerBatchWriter.close(TabletServerBatchWriter.java:352)
>         - locked <0x000000078d154840> (a org.apache.accumulo.core.client.impl.TabletServerBatchWriter)
>         at org.apache.accumulo.core.client.impl.BatchWriterImpl.close(BatchWriterImpl.java:54)
>         at org.apache.accumulo.server.util.MetadataTableUtil.markLogUnused(MetadataTableUtil.java:1131)
>         at org.apache.accumulo.tserver.TabletServer.markUnusedWALs(TabletServer.java:3032)
>         at org.apache.accumulo.tserver.TabletServer.minorCompactionFinished(TabletServer.java:2917)
>         at org.apache.accumulo.tserver.tablet.DatafileManager.bringMinorCompactionOnline(DatafileManager.java:440)
>         at org.apache.accumulo.tserver.tablet.Tablet.minorCompact(Tablet.java:956)
>         at org.apache.accumulo.tserver.tablet.MinorCompactionTask.run(MinorCompactionTask.java:84)
>         at org.apache.accumulo.tserver.tablet.Tablet.minorCompactNow(Tablet.java:1080)
>         at org.apache.accumulo.tserver.TabletServer$AssignmentHandler.run(TabletServer.java:2124)
>         at org.apache.accumulo.tserver.TabletServer$ThriftClientHandler$3.run(TabletServer.java:1510)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message