accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Hulbert <ahulb...@ccri.com>
Subject Re: java.IO.EOFException: ..../accumulo/recovery/.../part-r-00000/index not a SequenceFile.
Date Tue, 18 Oct 2016 14:32:22 GMT
I'll try to dig up the full error from the tserver


On 10/18/2016 10:30 AM, Josh Elser wrote:
> Do you have the full exception for the "Expected protocol id.." error?
>
> That looks like it might be incorrect usage of Thrift on our part..
>
> Andrew Hulbert wrote:
>> Mike,
>>
>> So backing up and then later deleting the recovery directories a few
>> times did the trick. It seemed that removing the initial bad one caused
>> the others to go through for the most part...
>>
>> I believe all the WAL files were there. I'll look for the WAL deleted in
>> the GC logs and see if there's any evidence of that. It is version 1.6.4
>> by the way. Unfortunately can't send the logs to you here but I did save
>> them off and I'll talk to Jeff about what we can do.
>>
>> We are currently getting a new error that I'm going to look into...
>>
>> Expected protocol id ffffffff82 but got 0
>>
>> Expected protocol id ffffffff82 but got 6e
>>
>> etc.
>>
>> Looking into that now! Thanks for the help so far, as usual!
>>
>> Andrew
>>
>> On 10/18/2016 09:46 AM, Michael Wall wrote:
>>> Andrew,
>>>
>>> That is what I was going to suggest you try.  Where is that "Unable to
>>> find recovery files for extent" log?  Anyway we can see some actual 
>>> logs?
>>>
>>> Are all the WALs there?  Do you find any of the WAL deleted by GC in
>>> the gc logs?  Do you find any duplicates WALs in the HDFS trash?
>>>
>>> On Tue, Oct 18, 2016 at 9:32 AM, Andrew Hulbert <ahulbert@ccri.com
>>> <mailto:ahulbert@ccri.com>> wrote:
>>>
>>>     Mike,
>>>
>>>     For one of the WALs I backed up the recovery directory and that
>>>     initiated a new recovery attempt as indicated in the tserver debug
>>>     log...
>>>
>>>     Then the exception was thrown:
>>>
>>>     Unable to find recovery files for extent xxxxxx logentry xxxxx
>>>     hdfs://path/to/wal/yyyy
>>>
>>>     Any ideas? I figure we can zero out the WAL and it will go on with
>>>     life but it would be nice to try and get the data!
>>>
>>>     Thanks!
>>>
>>>
>>>     On 10/18/2016 08:55 AM, Jeff Kubina wrote:
>>>>
>>>>     On Tue, Oct 18, 2016 at 6:32 AM, Michael Wall <mjwall@gmail.com
>>>>     <mailto:mjwall@gmail.com>> wrote:
>>>>
>>>>         Take a look at the master logs for where the WAL was sorted
>>>>         to the /accumulo/recovery/... directory.  Then look to see if
>>>>         those WALs are still around and contain content.
>>>>
>>>>
>>>>     Checked one of them, yes it is around with content.
>>>>
>>>>         Where is this this EOF exception, on a tserver?
>>>>
>>>>
>>>>     Yes, the tserver.
>>>>
>>>>         Is the master log complaining about anything?
>>>>
>>>>
>>>>     Repeating a message similar to the tserver but also that the
>>>>     tablet assignment failed for the tserver.
>>>>
>>>>     tservers are not balancing because of all this.
>>>>
>>>>
>>>
>>>
>>


Mime
View raw message