accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Wall <mjw...@gmail.com>
Subject Re: Recovery file versus directory
Date Fri, 18 Mar 2016 00:18:27 GMT
Andrew,

Sounds a lot like https://issues.apache.org/jira/browse/ACCUMULO-4157.
I'll look to see if what you describe could also happen with this bug.  If
you still have the gc logs, can you look for a message like "Removing WAL
for offline server" with the uuid?

Mike

On Tue, Mar 8, 2016 at 11:28 AM, Andrew Hulbert <ahulbert@ccri.com> wrote:

> Hi folks,
>
> We experienced a problem this morning with a recovery on 1.6.1 that went
> something like this:
>
> FileNotFoundException: File does not exist:
> hdfs:///accumulo/recovery/<uuid>/failed/data
>
> at Tablet.java:1410
> at Tablet.java:1233
> etc.
> at TabletServer:2923
>
> Interestingly enough, at hdfs:///accumulo/recovery/<uuid>/failed was a 0
> byte file, not a directory...and it was preventing tablets from getting
> assigned (I am not sure what caused the original failure, but I believe
> what happened is a tserver node was going down...the master indicated it
> was trying to shutdown the a tserver which was so bad off someone just
> rekicked the node).
>
> I looked through the fixes for 1.6.2,3,4,5 but didn't see anything related
> on the release notes pages but I haven't gone through all the tickets yet.
> I haven't been able to get anyone to upgrade to 1.6.5 yet and perhaps its
> already fixed.
>
> Just wondering if that's something that has been seen before?
>
> In order to fix it I just deleted the failed file and it proceeded
>
> Thanks!
>
> Andrew
>

Mime
View raw message