accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ke...@deenlo.com
Subject Re: Review Request 19804: ACCUMULO-2519 Aborts upgrade if there are Fate transactions from an old version.
Date Sat, 29 Mar 2014 00:26:59 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19804/#review38972
-----------------------------------------------------------



server/src/main/java/org/apache/accumulo/server/master/Master.java
<https://reviews.apache.org/r/19804/#comment71354>

    The comment could mention that fate has not been started.
    
    Could add a sanity check to ensure fate was not started.



server/src/main/java/org/apache/accumulo/server/master/Master.java
<https://reviews.apache.org/r/19804/#comment71353>

    I think this check can cause problems. Master.run()  starts StatusThread, StatusThread.run()
will indirectly call upgradeMetadata().  After Master.run() starts StatusThread, it seems
like it will start Fate and the client service.  So its possible that a 1.5 client could submit
a fate op before the upgradeMetadata() is called. 
    
    Also, this check is probably not needed.  upgradeZookeeper() should be called before upgradeMetadata().
 Could add a sanity check for this.
    



server/src/main/java/org/apache/accumulo/server/tabletserver/TabletServer.java
<https://reviews.apache.org/r/19804/#comment71357>

    Seems like there is a possibility of deadlock here.
    
     1. Master gets past upgradeZookeeper()
     2. Client submits FATE op
     3. Tablet server aborts copying walogs up
     4. Master can not upgradeMetadata because log recovery is needed, stuck.
    
    This is assuming that what I said in prev comment about Fate starting after upgrade zookeeper
is right.  Need to confirm this.
    
    Some possible options:
    
     * prevent fate from starting until upgrade is complete
     * only abort if there are FATE txs and upgradeZookeeper() has not run.  Would need to
look for something that upgradeZookeeper() changes.
     * Don't delete walogs after copy if upgrade is not complete.  However would need to delete
later then.  
    
    I'll think about this some more later.


- kturner


On March 28, 2014, 9:22 p.m., Sean Busbey wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19804/
> -----------------------------------------------------------
> 
> (Updated March 28, 2014, 9:22 p.m.)
> 
> 
> Review request for accumulo and kturner.
> 
> 
> Bugs: ACCUMULO-2519
>     https://issues.apache.org/jira/browse/ACCUMULO-2519
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Adds "make sure Fate has no outstanding items" to the upgrade instructions. Makes sure
the master and tabletservers don't take upgrade steps if they see fate ops waiting.
> 
> 
> Diffs
> -----
> 
>   README 115a9b7 
>   server/src/main/java/org/apache/accumulo/server/Accumulo.java 99ec7e4 
>   server/src/main/java/org/apache/accumulo/server/master/Master.java 8c4c864 
>   server/src/main/java/org/apache/accumulo/server/tabletserver/TabletServer.java d76946d

> 
> Diff: https://reviews.apache.org/r/19804/diff/
> 
> 
> Testing
> -------
> 
> Took a 1.4.5-SNAP cluster
> 
> * triggered compactions
> * shutdown cluster
> * verified waiting transactions
> * verified waiting local WALs
> * verified /accumulo/version showed 4
> * Start upgrade to 1.5.2-SNAP
> * verified errors showing no upgrade and to go back to docs in: monitor, master logs,
tabletserver logs
> * verified waiting transactions
> * verified waiting local WALs
> * verified /accumulo/version showed 4
> * Cleared Fate operations
> * Start upgrade to 1.5.2-SNAP
> * verify no errors shown for upgrade
> * verified WALs copied to HDFS
> * verified /accumulo/version showed 5
> * verified monitor showed normal start up
> 
> Running verify job on existing data now. should take ~6 hours. 
> 
> 
> Thanks,
> 
> Sean Busbey
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message