accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Busbey <bus...@cloudera.com>
Subject Re: Mapreduce output format killing tablet servers
Date Wed, 25 Jun 2014 19:42:47 GMT
if you only have 4G available, I'm not sure what kind of Hadoop cluster you
expect to be able to run, let alone Accumulo. ;)

-Sean

On Wed, Jun 25, 2014 at 2:34 PM, Josh Elser <josh.elser@gmail.com> wrote:

> If you only have 4G available, >=2G is probably a little excessive for the
> OS :)
>
>
> On 6/25/14, 3:30 PM, Sean Busbey wrote:
>
>> you can also calculate how much memory you need to have (or your cluster
>> management software can do it for you).
>>
>> Things to factor:
>>
>> OS needs (>= 2GB)
>> DataNode
>> TaskTracker (or NodeManager depending on MRv1 vs YARN)
>> task memory (child slots * per-child max under MRv1)
>> TServer Java Heap
>> TServer native map
>>
>> Plus any other processes you regularly run on those nodes.
>>
>>
>> On Wed, Jun 25, 2014 at 2:07 PM, John Vines <vines@apache.org
>> <mailto:vines@apache.org>> wrote:
>>
>>     It's also possible that you're overscribing your memory on the
>>     overall system between the tservers and the MR slots. Check yoru
>>     syslogs and see if there's anything about killing java processes.
>>
>>
>>     On Wed, Jun 25, 2014 at 3:05 PM, Jacob Rust <jrust@clearedgeit.com
>>     <mailto:jrust@clearedgeit.com>> wrote:
>>
>>         I will play around with the memory settings some more, it sounds
>>         like that is definitely it. Thanks everyone!
>>
>>
>>         On Wed, Jun 25, 2014 at 2:55 PM, Josh Elser
>>         <josh.elser@gmail.com <mailto:josh.elser@gmail.com>> wrote:
>>
>>             The lack of exception in the debug log makes it seem even
>>             more likely that you just got an OOME.
>>
>>             It's a crap-shoot as to whether or not you'll actually get
>>             the Exception printed in the log, but you should always get
>>             it in the .out/.err files as previously mentioned.
>>
>>
>>             On 6/25/14, 2:44 PM, Jacob Rust wrote:
>>
>>                 Ah, here is the right log: http://pastebin.com/DLEzLGqN
>>
>>                 I will double check which example. Thanks.
>>
>>
>>                 On Wed, Jun 25, 2014 at 2:38 PM, John Vines
>>                 <vines@apache.org <mailto:vines@apache.org>
>>                 <mailto:vines@apache.org <mailto:vines@apache.org>>>
>> wrote:
>>
>>                      And you're certain your using the standalone
>>                 example and not the
>>                      native-standalone? Those expect the native
>>                 libraries to be extant
>>                      and if not will eventually cause an OOM.
>>
>>
>>                      On Wed, Jun 25, 2014 at 2:33 PM, Jacob Rust
>>                 <jrust@clearedgeit.com <mailto:jrust@clearedgeit.com>
>>                      <mailto:jrust@clearedgeit.com
>>
>>                 <mailto:jrust@clearedgeit.com>>__> wrote:
>>
>>                          Accumulo version   1.5.1.2.1.2.1-471
>>                          Hadoop version 2.4.0.2.1.2.1-471
>>                 <tel:2.4.0.2.1.2.1-471> <tel:2.4.0.2.1.2.1-471
>>
>>                 <tel:2.4.0.2.1.2.1-471>>
>>
>>                          tserver debug log http://pastebin.com/BHdTkxeK
>>
>>                          I what you mean about the memory. I am using
>>                 the memory settings
>>                          from the example files
>>                 https://github.com/apache/__accumulo/tree/master/conf/__
>> examples/512MB/standalone
>>                 <https://github.com/apache/accumulo/tree/master/conf/
>> examples/512MB/standalone>.
>>
>>                          I also ran into this problem using the 1GB
>>                 example memory
>>                          settings. Each node has 4GB RAM.
>>
>>                          Thanks
>>
>>
>>                          On Wed, Jun 25, 2014 at 2:10 PM, Sean Busbey
>>                          <busbey@cloudera.com
>>                 <mailto:busbey@cloudera.com> <mailto:busbey@cloudera.com
>>
>>                 <mailto:busbey@cloudera.com>>> wrote:
>>
>>                              What version of Accumulo?
>>
>>                              What version of Hadoop?
>>
>>                              What does your server memory and per-role
>>                 allocation look like?
>>
>>                              Can you paste the tserver debug log?
>>
>>
>>
>>                              On Wed, Jun 25, 2014 at 1:01 PM, Jacob Rust
>>                              <jrust@clearedgeit.com
>>                 <mailto:jrust@clearedgeit.com>
>>                 <mailto:jrust@clearedgeit.com
>>
>>                 <mailto:jrust@clearedgeit.com>>__> wrote:
>>
>>                                  I am trying to create an inverted text
>>                 index for a table
>>                                  using accumulo input/output format in a
>>                 java
>>                                  mapreduce program.  When the job
>>                 reaches the reduce
>>                                  phase and creates the table / tries to
>>                 write to it the
>>                                  tablet servers begin to die.
>>
>>                                  Now when I do a start-all.sh the tablet
>>                 servers start
>>                                  for about a minute and then die again.
>>                 Any idea as to
>>                                  why the mapreduce job is killing the
>>                 tablet servers
>>                                  and/or how to bring the tablet servers
>>                 back up without
>>                                  failing?
>>
>>                                  This is on a 12 node cluster with low
>>                 quality hardware.
>>                                  The java code I am running is here
>>                 http://pastebin.com/ti7Qz19m
>>
>>                                  The log files on each tablet server
>>                 only display the
>>                                  startup information, no errors. The log
>>                 files on the
>>                                  master server show these errors
>>                 http://pastebin.com/LymiTfB7
>>
>>
>>
>>
>>                                  --
>>                                  Jacob Rust
>>                                  Software Intern
>>
>>
>>
>>
>>                              --
>>                              Sean
>>
>>
>>
>>
>>                          --
>>                          Jacob Rust
>>                          Software Intern
>>
>>
>>
>>
>>
>>                 --
>>                 Jacob Rust
>>                 Software Intern
>>
>>
>>
>>
>>         --
>>         Jacob Rust
>>         Software Intern
>>
>>
>>
>>
>>
>> --
>> Sean
>>
>


-- 
Sean

Mime
View raw message