accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: java.IO.EOFException: ..../accumulo/recovery/.../part-r-00000/index not a SequenceFile.
Date Tue, 18 Oct 2016 14:30:45 GMT
Do you have the full exception for the "Expected protocol id.." error?

That looks like it might be incorrect usage of Thrift on our part..

Andrew Hulbert wrote:
> Mike,
>
> So backing up and then later deleting the recovery directories a few
> times did the trick. It seemed that removing the initial bad one caused
> the others to go through for the most part...
>
> I believe all the WAL files were there. I'll look for the WAL deleted in
> the GC logs and see if there's any evidence of that. It is version 1.6.4
> by the way. Unfortunately can't send the logs to you here but I did save
> them off and I'll talk to Jeff about what we can do.
>
> We are currently getting a new error that I'm going to look into...
>
> Expected protocol id ffffffff82 but got 0
>
> Expected protocol id ffffffff82 but got 6e
>
> etc.
>
> Looking into that now! Thanks for the help so far, as usual!
>
> Andrew
>
> On 10/18/2016 09:46 AM, Michael Wall wrote:
>> Andrew,
>>
>> That is what I was going to suggest you try.  Where is that "Unable to
>> find recovery files for extent" log?  Anyway we can see some actual logs?
>>
>> Are all the WALs there?  Do you find any of the WAL deleted by GC in
>> the gc logs?  Do you find any duplicates WALs in the HDFS trash?
>>
>> On Tue, Oct 18, 2016 at 9:32 AM, Andrew Hulbert <ahulbert@ccri.com
>> <mailto:ahulbert@ccri.com>> wrote:
>>
>>     Mike,
>>
>>     For one of the WALs I backed up the recovery directory and that
>>     initiated a new recovery attempt as indicated in the tserver debug
>>     log...
>>
>>     Then the exception was thrown:
>>
>>     Unable to find recovery files for extent xxxxxx logentry xxxxx
>>     hdfs://path/to/wal/yyyy
>>
>>     Any ideas? I figure we can zero out the WAL and it will go on with
>>     life but it would be nice to try and get the data!
>>
>>     Thanks!
>>
>>
>>     On 10/18/2016 08:55 AM, Jeff Kubina wrote:
>>>
>>>     On Tue, Oct 18, 2016 at 6:32 AM, Michael Wall <mjwall@gmail.com
>>>     <mailto:mjwall@gmail.com>> wrote:
>>>
>>>         Take a look at the master logs for where the WAL was sorted
>>>         to the /accumulo/recovery/... directory.  Then look to see if
>>>         those WALs are still around and contain content.
>>>
>>>
>>>     Checked one of them, yes it is around with content.
>>>
>>>         Where is this this EOF exception, on a tserver?
>>>
>>>
>>>     Yes, the tserver.
>>>
>>>         Is the master log complaining about anything?
>>>
>>>
>>>     Repeating a message similar to the tserver but also that the
>>>     tablet assignment failed for the tserver.
>>>
>>>     tservers are not balancing because of all this.
>>>
>>>
>>
>>
>

Mime
View raw message