accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Busbey" <s...@manvsbeard.com>
Subject Re: Review Request 19804: ACCUMULO-2519 Aborts upgrade if there are Fate transactions from an old version.
Date Wed, 02 Apr 2014 06:06:46 GMT


> On March 29, 2014, 12:26 a.m., kturner wrote:
> > server/src/main/java/org/apache/accumulo/server/tabletserver/TabletServer.java,
line 3328
> > <https://reviews.apache.org/r/19804/diff/1/?file=539927#file539927line3328>
> >
> >     Seems like there is a possibility of deadlock here.
> >     
> >      1. Master gets past upgradeZookeeper()
> >      2. Client submits FATE op
> >      3. Tablet server aborts copying walogs up
> >      4. Master can not upgradeMetadata because log recovery is needed, stuck.
> >     
> >     This is assuming that what I said in prev comment about Fate starting after
upgrade zookeeper is right.  Need to confirm this.
> >     
> >     Some possible options:
> >     
> >      * prevent fate from starting until upgrade is complete
> >      * only abort if there are FATE txs and upgradeZookeeper() has not run.  Would
need to look for something that upgradeZookeeper() changes.
> >      * Don't delete walogs after copy if upgrade is not complete.  However would
need to delete later then.  
> >     
> >     I'll think about this some more later.

New patch ensures Fate does not start until after upgrade is complete.


- Sean


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19804/#review38972
-----------------------------------------------------------


On April 2, 2014, 6:06 a.m., Sean Busbey wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19804/
> -----------------------------------------------------------
> 
> (Updated April 2, 2014, 6:06 a.m.)
> 
> 
> Review request for accumulo and kturner.
> 
> 
> Bugs: ACCUMULO-2519
>     https://issues.apache.org/jira/browse/ACCUMULO-2519
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Adds "make sure Fate has no outstanding items" to the upgrade instructions. Makes sure
the master and tabletservers don't take upgrade steps if they see fate ops waiting.
> 
> 
> Diffs
> -----
> 
>   README 115a9b7 
>   server/src/main/java/org/apache/accumulo/server/Accumulo.java 99ec7e4 
>   server/src/main/java/org/apache/accumulo/server/master/Master.java 8c4c864 
>   server/src/main/java/org/apache/accumulo/server/tabletserver/TabletServer.java d76946d

>   server/src/main/java/org/apache/accumulo/server/util/MetadataTable.java 7328a55 
> 
> Diff: https://reviews.apache.org/r/19804/diff/
> 
> 
> Testing
> -------
> 
> Took a 1.4.5-SNAP cluster
> 
> * loaded test data in a variety of table configs
> * alternate table creation and deletion
> * load additional table to cause !METADATA churn
> * shutdown cluster uncleanly
> * verified waiting Fate transactions (table deletion at success status)
> * verified waiting local WALs
> * verified waiting local WALs include !METADATA table (via LogReader)
> * verified /accumulo/version showed 4
> * Start upgrade to 1.5.2-SNAP
> * verified errors showing no upgrade and to go back to docs in: monitor, master logs,
tabletserver logs
> * verified same waiting Fate transactions
> * verified same waiting local WALs
> * verified /accumulo/version showed 4
> * Cleared Fate operations
> * Start upgrade to 1.5.2-SNAP
> * wait a terrifying long amount of time, check on progress via local logs
> * verify no errors shown for upgrade
> * verified WALs copied to HDFS
> * verified /accumulo/version showed 5
> * verified monitor showed normal start up
> * wait for all tablets to be hosted
> * verify test data
> 
> 
> Thanks,
> 
> Sean Busbey
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message