accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Terry P." <texpi...@gmail.com>
Subject Re: Bloom filter thread failure errors
Date Thu, 05 Dec 2013 00:29:29 GMT
Hi Eric,
Thanks for your reply, I'm just now getting back to this as I had more of
these the past two days. No tserver failures or master halts. With previous
errors we were still experiencing network issues that were indeed taking
tabletservers down, but now that they fixed a bad line card in a switch
that had been rebooting itself (but not failing over), those issues are all
gone (finally, knock on wood).

Now that I see them again in isolation with no other errors, in the main
tserver log these bloom-loader thread failures appear to happen out of the
blue with no other issues surrounding them.

However, I just checked the debug log and see they are occurring right at
the time of a Major Compaction.  E.g. from one of the tservers debug log:

2013-12-03 11:48:14,738 [tabletserver.Tablet] DEBUG: MajC initiate lock
0.00 secs, wait 0.00 secs
2013-12-03 11:48:14,739 [tabletserver.Tablet] DEBUG: Starting MajC 2;f;d
(NORMAL) [/t-0000aa9/C0000zmf.rf, <several more rfiles listed> ] -->
[/t-0000aa9/C0000zn4.rf_tmp
2013-12-03 11:48:14,780 [file.BloomFilterLayer] ERROR: Thread
"bloom-loader-41" died File /accumulo/tables/2/t-0000aa9/C0000zmf.rf is
closed

The rest of the stack looks like what I posted earlier. The very next debug
log message after the bloom loader exception is shows that the Compaction
completed successfully in 0.112 seconds.

So it looks like the bloom loader is trying to open an rfile 41ms after a
compaction started, and the file was likely just compacted during that gap
between the calls. If that's the case, can this error be safely ignored?

Thanks,
Terry



On Mon, Nov 18, 2013 at 8:56 PM, Eric Newton <eric.newton@gmail.com> wrote:

> This is an educated guess...
>
> When a process dies "gracefully" there's a shutdown hook that closes the
> FileSystem.  That can result in messages like this.  It's likely there's an
> error before this about a zookeeper session being lost, or a halt issued by
> the master.  See if this tserver died shortly after this message. If so,
> ignore the message.
>
> -Eric
>
>
>
> On Fri, Nov 15, 2013 at 4:31 PM, Terry P. <texpilot@gmail.com> wrote:
>
>> Greetings folks,
>> In my Accumulo 1.4.2 cluster I am seeing ERRORS about bloom loader
>> threads dying due to an rfile being closed.  I can't copy/paste the error
>> as it's on an air-gapped system, but it starts with:
>>
>> ERROR Thread "bloom-loader-2147" died File
>> /accumulo/tables/2/t-0000aa4/F0000q3g.rf is closed
>>   java.lang.IllegalStateException: File
>> /accumulo/tables/2/t-0000aa4/F0000q3g.rf is closed
>>     at
>> org.apache.accumulo.core.file.blockfile.impl.CacheableBlockFile$Reader.getBCFile(CacheableBlockFile.java:244)
>>     at
>> org.apache.accumulo.core.file.blockfile.impl.CacheableBlockFile$Reader.access$000(CacheableBlockFile.java:142)
>> (10 more java files ... ends with java.lang.Thread.run(UnknownSource) )
>>
>> No real rhyme or reason as to when they occur; we are predominantly
>> ingest heavy with light reads by rowkey with ~10 entries per rowkey.  I
>> don't really know if client programs are getting errors when these occur or
>> not.
>>
>> I didn't find any JIRAs related to these.  Should I be concerned about
>> these?
>>
>
>

Mime
View raw message