accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Hulbert <ahulb...@ccri.com>
Subject Re: java.IO.EOFException: ..../accumulo/recovery/.../part-r-00000/index not a SequenceFile.
Date Tue, 18 Oct 2016 14:31:28 GMT
Note that the error is more like this:

Expected protocol id ffffff82 but got 35 (0!;38\\;82,<servername>:9997, 
<somelonghex>)



On 10/18/2016 10:28 AM, Andrew Hulbert wrote:
>
> Mike,
>
> So backing up and then later deleting the recovery directories a few 
> times did the trick. It seemed that removing the initial bad one 
> caused the others to go through for the most part...
>
> I believe all the WAL files were there. I'll look for the WAL deleted 
> in the GC logs and see if there's any evidence of that. It is version 
> 1.6.4 by the way. Unfortunately can't send the logs to you here but I 
> did save them off and I'll talk to Jeff about what we can do.
>
> We are currently getting a new error that I'm going to look into...
>
> Expected protocol id ffffffff82 but got 0
>
> Expected protocol id ffffffff82 but got 6e
>
> etc.
>
> Looking into that now! Thanks for the help so far, as usual!
>
> Andrew
>
> On 10/18/2016 09:46 AM, Michael Wall wrote:
>> Andrew,
>>
>> That is what I was going to suggest you try.  Where is that "Unable 
>> to find recovery files for extent" log?  Anyway we can see some 
>> actual logs?
>>
>> Are all the WALs there?  Do you find any of the WAL deleted by GC in 
>> the gc logs?  Do you find any duplicates WALs in the HDFS trash?
>>
>> On Tue, Oct 18, 2016 at 9:32 AM, Andrew Hulbert <ahulbert@ccri.com 
>> <mailto:ahulbert@ccri.com>> wrote:
>>
>>     Mike,
>>
>>     For one of the WALs I backed up the recovery directory and that
>>     initiated a new recovery attempt as indicated in the tserver
>>     debug log...
>>
>>     Then the exception was thrown:
>>
>>     Unable to find recovery files for extent xxxxxx logentry xxxxx
>>     hdfs://path/to/wal/yyyy
>>
>>     Any ideas? I figure we can zero out the WAL and it will go on
>>     with life but it would be nice to try and get the data!
>>
>>     Thanks!
>>
>>
>>     On 10/18/2016 08:55 AM, Jeff Kubina wrote:
>>>
>>>     On Tue, Oct 18, 2016 at 6:32 AM, Michael Wall <mjwall@gmail.com
>>>     <mailto:mjwall@gmail.com>> wrote:
>>>
>>>         Take a look at the master logs for where the WAL was sorted
>>>         to the /accumulo/recovery/... directory.  Then look to see
>>>         if those WALs are still around and contain content.
>>>
>>>
>>>     Checked one of them, yes it is around with content.
>>>
>>>         Where is this this EOF exception, on a tserver?
>>>
>>>
>>>     Yes, the tserver.
>>>
>>>         Is the master log complaining about anything?
>>>
>>>
>>>     Repeating a message similar to the tserver but also that the
>>>     tablet assignment failed for the tserver.
>>>
>>>     tservers are not balancing because of all this.
>>>
>>>
>>
>>
>


Mime
View raw message