accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adam J. Shook" <>
Subject Re: Question on missing RFiles
Date Wed, 16 May 2018 15:25:02 GMT
I tried building a timeline but the logs are just not there.  We weren't
sending the debug logs to Splunk due to the verbosity, but we may be
tweaking the log4j settings a bit to make sure we get the log data stored
in the event this happens again.  This very well could be attributed to the
recovery failure; hard to say.  I'll be upgrading to 1.9.1 soon.

On Mon, May 14, 2018 at 8:53 AM, Michael Wall <> wrote:

> Can you pick some of the files that are missing and search through your
> logs to put together a timeline?  See if you can find that file for a
> specific tablet.  Then grab all the logs for when a file was created as
> result of a compaction, and a when a file was included in compaction for
> that table.  Follow compactions for that tablet until you started getting
> errors.  Then see what logs you have for WAL replay during that time for
> that tablet and the metadata and can try to correlate.
> It's a shame you don't have the GC logs.  If you saw it was GC'd then
> showed up in the metadata table again that would help explain what
> happened.  Like Christopher mentioned, this could be related to a recovery
> failure.
> Mike
> On Sat, May 12, 2018 at 5:26 PM Adam J. Shook <>
> wrote:
>> WALs are turned on.  Durability is set to flush for all tables except for
>> root and metadata which are sync.  The current rfile names on HDFS and
>> in the metadata table are greater than the files that are missing.
>>  Searched through all of our current and historical logs in Splunk (which
>> are only INFO level or higher).  Issues from the logs:
>> * Problem reports saying the files are not found
>> * IllegalStateException saying the rfile is closed when it tried to load
>> the Bloom filter (likely the flappy DataNode)
>> * IOException when reading the file saying Stream is closed (likely the
>> flappy DataNode)
>> Nothing in the GC logs -- all the above errors are in the tablet server
>> logs.  The logs may have rolled over, though, and our debug logs don't make
>> it into Splunk.
>> --Adam
>> On Fri, May 11, 2018 at 6:16 PM, Christopher <> wrote:
>>> Oh, it occurs to me that this may be related to the WAL bugs that Keith
>>> fixed for 1.9.1... which could affect the metadata table recovery after a
>>> failure.
>>> On Fri, May 11, 2018 at 6:11 PM Michael Wall <> wrote:
>>>> Adam,
>>>> Do you have GC logs?  Can you see if those missing RFiles were removed
>>>> by the GC process?  That could indicate you somehow got old metadata info
>>>> replayed.  Also, the rfiles increment so compare the current rfile names
>>>> the srv.dir directory vs what is in the metadata table.  Are the existing
>>>> files after files in the metadata.  Finally, pick a few of the missing
>>>> files and grep all your master and tserver logs to see if you can learn
>>>> anything.  This sounds ungood.
>>>> Mike
>>>> On Fri, May 11, 2018 at 6:06 PM Christopher <>
>>>> wrote:
>>>>> This is strange. I've only ever seen this when HDFS has reported
>>>>> problems, such as missing blocks, or another obvious failure. What is
>>>>> durability settings (were WALs turned on)?
>>>>> On Fri, May 11, 2018 at 12:45 PM Adam J. Shook <>
>>>>> wrote:
>>>>>> Hello all,
>>>>>> On one of our clusters, there are a good number of missing RFiles
>>>>>> from HDFS, however HDFS is not/has not reported any missing blocks.
>>>>>> were experiencing issues with HDFS; some flapping DataNode processes
>>>>>> needed more heap.
>>>>>> I don't anticipate I can do much besides create a bunch of empty
>>>>>> RFiles (open to suggestions).  My question is, Is it possible that
>>>>>> could have written the metadata for these RFiles but failed to write
it to
>>>>>> HDFS?  In which case it would have been re-tried later and the data
>>>>>> persisted to a different RFile?  Or is it an 'RFile is in Accumulo
>>>>>> if and only if it is in HDFS' situation?
>>>>>> Accumulo 1.8.1 on HDFS 2.6.0.
>>>>>> Thank you,
>>>>>> --Adam

View raw message