accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Busbey <bus...@cloudera.com>
Subject Re: Mapreduce output format killing tablet servers
Date Wed, 25 Jun 2014 19:30:24 GMT
you can also calculate how much memory you need to have (or your cluster
management software can do it for you).

Things to factor:

OS needs (>= 2GB)
DataNode
TaskTracker (or NodeManager depending on MRv1 vs YARN)
task memory (child slots * per-child max under MRv1)
TServer Java Heap
TServer native map

Plus any other processes you regularly run on those nodes.


On Wed, Jun 25, 2014 at 2:07 PM, John Vines <vines@apache.org> wrote:

> It's also possible that you're overscribing your memory on the overall
> system between the tservers and the MR slots. Check yoru syslogs and see if
> there's anything about killing java processes.
>
>
> On Wed, Jun 25, 2014 at 3:05 PM, Jacob Rust <jrust@clearedgeit.com> wrote:
>
>> I will play around with the memory settings some more, it sounds like
>> that is definitely it. Thanks everyone!
>>
>>
>> On Wed, Jun 25, 2014 at 2:55 PM, Josh Elser <josh.elser@gmail.com> wrote:
>>
>>> The lack of exception in the debug log makes it seem even more likely
>>> that you just got an OOME.
>>>
>>> It's a crap-shoot as to whether or not you'll actually get the Exception
>>> printed in the log, but you should always get it in the .out/.err files as
>>> previously mentioned.
>>>
>>>
>>> On 6/25/14, 2:44 PM, Jacob Rust wrote:
>>>
>>>> Ah, here is the right log: http://pastebin.com/DLEzLGqN
>>>>
>>>> I will double check which example. Thanks.
>>>>
>>>>
>>>> On Wed, Jun 25, 2014 at 2:38 PM, John Vines <vines@apache.org
>>>> <mailto:vines@apache.org>> wrote:
>>>>
>>>>     And you're certain your using the standalone example and not the
>>>>     native-standalone? Those expect the native libraries to be extant
>>>>     and if not will eventually cause an OOM.
>>>>
>>>>
>>>>     On Wed, Jun 25, 2014 at 2:33 PM, Jacob Rust <jrust@clearedgeit.com
>>>>     <mailto:jrust@clearedgeit.com>> wrote:
>>>>
>>>>         Accumulo version   1.5.1.2.1.2.1-471
>>>>         Hadoop version 2.4.0.2.1.2.1-471 <tel:2.4.0.2.1.2.1-471>
>>>>
>>>>         tserver debug log http://pastebin.com/BHdTkxeK
>>>>
>>>>         I what you mean about the memory. I am using the memory settings
>>>>         from the example files
>>>>         https://github.com/apache/accumulo/tree/master/conf/
>>>> examples/512MB/standalone.
>>>>         I also ran into this problem using the 1GB example memory
>>>>         settings. Each node has 4GB RAM.
>>>>
>>>>         Thanks
>>>>
>>>>
>>>>         On Wed, Jun 25, 2014 at 2:10 PM, Sean Busbey
>>>>         <busbey@cloudera.com <mailto:busbey@cloudera.com>> wrote:
>>>>
>>>>             What version of Accumulo?
>>>>
>>>>             What version of Hadoop?
>>>>
>>>>             What does your server memory and per-role allocation look
>>>> like?
>>>>
>>>>             Can you paste the tserver debug log?
>>>>
>>>>
>>>>
>>>>             On Wed, Jun 25, 2014 at 1:01 PM, Jacob Rust
>>>>             <jrust@clearedgeit.com <mailto:jrust@clearedgeit.com>>
>>>> wrote:
>>>>
>>>>                 I am trying to create an inverted text index for a table
>>>>                 using accumulo input/output format in a java
>>>>                 mapreduce program.  When the job reaches the reduce
>>>>                 phase and creates the table / tries to write to it the
>>>>                 tablet servers begin to die.
>>>>
>>>>                 Now when I do a start-all.sh the tablet servers start
>>>>                 for about a minute and then die again. Any idea as to
>>>>                 why the mapreduce job is killing the tablet servers
>>>>                 and/or how to bring the tablet servers back up without
>>>>                 failing?
>>>>
>>>>                 This is on a 12 node cluster with low quality hardware.
>>>>                 The java code I am running is here
>>>>                 http://pastebin.com/ti7Qz19m
>>>>
>>>>                 The log files on each tablet server only display the
>>>>                 startup information, no errors. The log files on the
>>>>                 master server show these errors
>>>> http://pastebin.com/LymiTfB7
>>>>
>>>>
>>>>
>>>>
>>>>                 --
>>>>                 Jacob Rust
>>>>                 Software Intern
>>>>
>>>>
>>>>
>>>>
>>>>             --
>>>>             Sean
>>>>
>>>>
>>>>
>>>>
>>>>         --
>>>>         Jacob Rust
>>>>         Software Intern
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Jacob Rust
>>>> Software Intern
>>>>
>>>
>>
>>
>> --
>> Jacob Rust
>> Software Intern
>>
>
>


-- 
Sean

Mime
View raw message