accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Hulbert <ahulb...@ccri.com>
Subject Re: java.IO.EOFException: ..../accumulo/recovery/.../part-r-00000/index not a SequenceFile.
Date Tue, 18 Oct 2016 14:40:10 GMT
Yes, it looks similar.

Esp these parts:

2015-11-19 22:43:05,998 [impl.TabletServerBatchReaderIterator] DEBUG: org.apache.thrift.protocol.TProtocolException:
Expected protocol id ffffff82 but got 19
java.io.IOException: org.apache.thrift.protocol.TProtocolException: Expected protocol id ffffff82
but got 19
	at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:702)
	at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:349)
	at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
	at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.thrift.protocol.TProtocolException: Expected protocol id ffffff82 but
got 19
	at org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:472)
	at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
	at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startMultiScan(TabletClientService.java:317)
	at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startMultiScan(TabletClientService.java:297)
	at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:634)
	... 6 more




On 10/18/2016 10:34 AM, Josh Elser wrote:
> Or, if it's more convenient, this is the issue I was thinking of: 
> https://issues.apache.org/jira/browse/ACCUMULO-4065
>
> Andrew Hulbert wrote:
>> I'll try to dig up the full error from the tserver
>>
>>
>> On 10/18/2016 10:30 AM, Josh Elser wrote:
>>> Do you have the full exception for the "Expected protocol id.." error?
>>>
>>> That looks like it might be incorrect usage of Thrift on our part..
>>>
>>> Andrew Hulbert wrote:
>>>> Mike,
>>>>
>>>> So backing up and then later deleting the recovery directories a few
>>>> times did the trick. It seemed that removing the initial bad one 
>>>> caused
>>>> the others to go through for the most part...
>>>>
>>>> I believe all the WAL files were there. I'll look for the WAL 
>>>> deleted in
>>>> the GC logs and see if there's any evidence of that. It is version 
>>>> 1.6.4
>>>> by the way. Unfortunately can't send the logs to you here but I did 
>>>> save
>>>> them off and I'll talk to Jeff about what we can do.
>>>>
>>>> We are currently getting a new error that I'm going to look into...
>>>>
>>>> Expected protocol id ffffffff82 but got 0
>>>>
>>>> Expected protocol id ffffffff82 but got 6e
>>>>
>>>> etc.
>>>>
>>>> Looking into that now! Thanks for the help so far, as usual!
>>>>
>>>> Andrew
>>>>
>>>> On 10/18/2016 09:46 AM, Michael Wall wrote:
>>>>> Andrew,
>>>>>
>>>>> That is what I was going to suggest you try. Where is that "Unable to
>>>>> find recovery files for extent" log? Anyway we can see some actual
>>>>> logs?
>>>>>
>>>>> Are all the WALs there? Do you find any of the WAL deleted by GC in
>>>>> the gc logs? Do you find any duplicates WALs in the HDFS trash?
>>>>>
>>>>> On Tue, Oct 18, 2016 at 9:32 AM, Andrew Hulbert <ahulbert@ccri.com
>>>>> <mailto:ahulbert@ccri.com>> wrote:
>>>>>
>>>>> Mike,
>>>>>
>>>>> For one of the WALs I backed up the recovery directory and that
>>>>> initiated a new recovery attempt as indicated in the tserver debug
>>>>> log...
>>>>>
>>>>> Then the exception was thrown:
>>>>>
>>>>> Unable to find recovery files for extent xxxxxx logentry xxxxx
>>>>> hdfs://path/to/wal/yyyy
>>>>>
>>>>> Any ideas? I figure we can zero out the WAL and it will go on with
>>>>> life but it would be nice to try and get the data!
>>>>>
>>>>> Thanks!
>>>>>
>>>>>
>>>>> On 10/18/2016 08:55 AM, Jeff Kubina wrote:
>>>>>>
>>>>>> On Tue, Oct 18, 2016 at 6:32 AM, Michael Wall <mjwall@gmail.com
>>>>>> <mailto:mjwall@gmail.com>> wrote:
>>>>>>
>>>>>> Take a look at the master logs for where the WAL was sorted
>>>>>> to the /accumulo/recovery/... directory. Then look to see if
>>>>>> those WALs are still around and contain content.
>>>>>>
>>>>>>
>>>>>> Checked one of them, yes it is around with content.
>>>>>>
>>>>>> Where is this this EOF exception, on a tserver?
>>>>>>
>>>>>>
>>>>>> Yes, the tserver.
>>>>>>
>>>>>> Is the master log complaining about anything?
>>>>>>
>>>>>>
>>>>>> Repeating a message similar to the tserver but also that the
>>>>>> tablet assignment failed for the tserver.
>>>>>>
>>>>>> tservers are not balancing because of all this.
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>


Mime
View raw message