accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Newton <eric.new...@gmail.com>
Subject Re: Determining the cause of a tablet server failure
Date Wed, 27 Feb 2013 17:15:53 GMT
Also, check the "gc" lines from the debug logs:

$ grep -a 'gc' logs/tserver*.debug.log

They should come about one-per-second.  You may see pauses due to swapping
out or low memory.

-Eric


On Wed, Feb 27, 2013 at 12:12 PM, John Vines <vines@apache.org> wrote:

> Check the .out and .err files. Out of Memory exceptions aren't caught by
> log4j and instead go to those files.
>
>
> On Wed, Feb 27, 2013 at 12:10 PM, Mike Hugo <mike@piragua.com> wrote:
>
>> After running an ingest process via map reduce for about an hour or so,
>> one of our tserver fails.  It happens pretty consistently, we're able to
>> replicate it without too much difficulty.
>>
>> I'm looking in the $ACCUMULO_HOME/logs directory for clues as to why the
>> tserver fails, but I'm not seeing much that points to a cause of the
>> tserver going offline.   One minute it's there, the next it's offline.
>>  There are some warnings about the swappiness as well as a large row that
>> cannot be spit but other than that, not much else to go on.
>>
>> Is there anything that could help me figure out *why* the tserver died?
>>  I'm guessing it's something in our client code or a config that's not
>> correct on the server, but it'd be really nice to have a hint before we
>> start randomly changing things to see what will fix it.
>>
>> Thanks,
>>
>> Mike
>>
>
>

Mime
View raw message