accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Busbey <bus...@cloudera.com>
Subject Re: Mapreduce output format killing tablet servers
Date Wed, 25 Jun 2014 18:31:24 GMT
yeah, oversubscribed on memory is my guess. It's the most common problem I
see, esp when the failure happens during load like MR jobs.


On Wed, Jun 25, 2014 at 1:29 PM, John Vines <vines@apache.org> wrote:

> Why are your tservers dying. You say it only shows startup with no errors,
> but what about from before you restart? Keep in mind that the out and err
> files get clobbered on restart, so you need to check these before you
> restart them.
>
> I have a hunch that you're either experiencing OOM errors, which is an
> indication of poor accumulo configuration, or you're using ZK locks, which
> is an indicator of various things from poor network to poor system
> configuration.
>
>
> On Wed, Jun 25, 2014 at 2:10 PM, Sean Busbey <busbey@cloudera.com> wrote:
>
>> What version of Accumulo?
>>
>> What version of Hadoop?
>>
>> What does your server memory and per-role allocation look like?
>>
>> Can you paste the tserver debug log?
>>
>>
>>
>> On Wed, Jun 25, 2014 at 1:01 PM, Jacob Rust <jrust@clearedgeit.com>
>> wrote:
>>
>>> I am trying to create an inverted text index for a table using accumulo
>>> input/output format in a java mapreduce program.  When the job reaches the
>>> reduce phase and creates the table / tries to write to it the tablet
>>> servers begin to die.
>>>
>>> Now when I do a start-all.sh the tablet servers start for about a minute
>>> and then die again. Any idea as to why the mapreduce job is killing the
>>> tablet servers and/or how to bring the tablet servers back up without
>>> failing?
>>>
>>> This is on a 12 node cluster with low quality hardware.
>>>
>>> The java code I am running is here http://pastebin.com/ti7Qz19m
>>>
>>>  The log files on each tablet server only display the startup
>>> information, no errors. The log files on the master server show these
>>> errors http://pastebin.com/LymiTfB7
>>>
>>>
>>>
>>>
>>> --
>>> Jacob Rust
>>> Software Intern
>>>
>>
>>
>>
>> --
>> Sean
>>
>
>


-- 
Sean

Mime
View raw message