accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: java.IO.EOFException: ..../accumulo/recovery/.../part-r-00000/index not a SequenceFile.
Date Tue, 18 Oct 2016 14:34:24 GMT
Or, if it's more convenient, this is the issue I was thinking of: 
https://issues.apache.org/jira/browse/ACCUMULO-4065

Andrew Hulbert wrote:
> I'll try to dig up the full error from the tserver
>
>
> On 10/18/2016 10:30 AM, Josh Elser wrote:
>> Do you have the full exception for the "Expected protocol id.." error?
>>
>> That looks like it might be incorrect usage of Thrift on our part..
>>
>> Andrew Hulbert wrote:
>>> Mike,
>>>
>>> So backing up and then later deleting the recovery directories a few
>>> times did the trick. It seemed that removing the initial bad one caused
>>> the others to go through for the most part...
>>>
>>> I believe all the WAL files were there. I'll look for the WAL deleted in
>>> the GC logs and see if there's any evidence of that. It is version 1.6.4
>>> by the way. Unfortunately can't send the logs to you here but I did save
>>> them off and I'll talk to Jeff about what we can do.
>>>
>>> We are currently getting a new error that I'm going to look into...
>>>
>>> Expected protocol id ffffffff82 but got 0
>>>
>>> Expected protocol id ffffffff82 but got 6e
>>>
>>> etc.
>>>
>>> Looking into that now! Thanks for the help so far, as usual!
>>>
>>> Andrew
>>>
>>> On 10/18/2016 09:46 AM, Michael Wall wrote:
>>>> Andrew,
>>>>
>>>> That is what I was going to suggest you try. Where is that "Unable to
>>>> find recovery files for extent" log? Anyway we can see some actual
>>>> logs?
>>>>
>>>> Are all the WALs there? Do you find any of the WAL deleted by GC in
>>>> the gc logs? Do you find any duplicates WALs in the HDFS trash?
>>>>
>>>> On Tue, Oct 18, 2016 at 9:32 AM, Andrew Hulbert <ahulbert@ccri.com
>>>> <mailto:ahulbert@ccri.com>> wrote:
>>>>
>>>> Mike,
>>>>
>>>> For one of the WALs I backed up the recovery directory and that
>>>> initiated a new recovery attempt as indicated in the tserver debug
>>>> log...
>>>>
>>>> Then the exception was thrown:
>>>>
>>>> Unable to find recovery files for extent xxxxxx logentry xxxxx
>>>> hdfs://path/to/wal/yyyy
>>>>
>>>> Any ideas? I figure we can zero out the WAL and it will go on with
>>>> life but it would be nice to try and get the data!
>>>>
>>>> Thanks!
>>>>
>>>>
>>>> On 10/18/2016 08:55 AM, Jeff Kubina wrote:
>>>>>
>>>>> On Tue, Oct 18, 2016 at 6:32 AM, Michael Wall <mjwall@gmail.com
>>>>> <mailto:mjwall@gmail.com>> wrote:
>>>>>
>>>>> Take a look at the master logs for where the WAL was sorted
>>>>> to the /accumulo/recovery/... directory. Then look to see if
>>>>> those WALs are still around and contain content.
>>>>>
>>>>>
>>>>> Checked one of them, yes it is around with content.
>>>>>
>>>>> Where is this this EOF exception, on a tserver?
>>>>>
>>>>>
>>>>> Yes, the tserver.
>>>>>
>>>>> Is the master log complaining about anything?
>>>>>
>>>>>
>>>>> Repeating a message similar to the tserver but also that the
>>>>> tablet assignment failed for the tserver.
>>>>>
>>>>> tservers are not balancing because of all this.
>>>>>
>>>>>
>>>>
>>>>
>>>
>

Mime
View raw message