accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Hulbert <ahulb...@ccri.com>
Subject Re: Recovery file versus directory
Date Fri, 18 Mar 2016 13:43:10 GMT
I'll tar them up and see what I can find! Thanks.

On 03/17/2016 08:18 PM, Michael Wall wrote:
> Andrew,
>
> Sounds a lot like https://issues.apache.org/jira/browse/ACCUMULO-4157. 
> I'll look to see if what you describe could also happen with this 
> bug.  If you still have the gc logs, can you look for a message like 
> "Removing WAL for offline server" with the uuid?
>
> Mike
>
> On Tue, Mar 8, 2016 at 11:28 AM, Andrew Hulbert <ahulbert@ccri.com 
> <mailto:ahulbert@ccri.com>> wrote:
>
>     Hi folks,
>
>     We experienced a problem this morning with a recovery on 1.6.1
>     that went something like this:
>
>     FileNotFoundException: File does not exist:
>     hdfs:///accumulo/recovery/<uuid>/failed/data
>
>     at Tablet.java:1410
>     at Tablet.java:1233
>     etc.
>     at TabletServer:2923
>
>     Interestingly enough, at hdfs:///accumulo/recovery/<uuid>/failed
>     was a 0 byte file, not a directory...and it was preventing tablets
>     from getting assigned (I am not sure what caused the original
>     failure, but I believe what happened is a tserver node was going
>     down...the master indicated it was trying to shutdown the a
>     tserver which was so bad off someone just rekicked the node).
>
>     I looked through the fixes for 1.6.2,3,4,5 but didn't see anything
>     related on the release notes pages but I haven't gone through all
>     the tickets yet. I haven't been able to get anyone to upgrade to
>     1.6.5 yet and perhaps its already fixed.
>
>     Just wondering if that's something that has been seen before?
>
>     In order to fix it I just deleted the failed file and it proceeded
>
>     Thanks!
>
>     Andrew
>
>


Mime
View raw message