accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: Mapreduce output format killing tablet servers
Date Wed, 25 Jun 2014 19:34:40 GMT
If you only have 4G available, >=2G is probably a little excessive for 
the OS :)

On 6/25/14, 3:30 PM, Sean Busbey wrote:
> you can also calculate how much memory you need to have (or your cluster
> management software can do it for you).
>
> Things to factor:
>
> OS needs (>= 2GB)
> DataNode
> TaskTracker (or NodeManager depending on MRv1 vs YARN)
> task memory (child slots * per-child max under MRv1)
> TServer Java Heap
> TServer native map
>
> Plus any other processes you regularly run on those nodes.
>
>
> On Wed, Jun 25, 2014 at 2:07 PM, John Vines <vines@apache.org
> <mailto:vines@apache.org>> wrote:
>
>     It's also possible that you're overscribing your memory on the
>     overall system between the tservers and the MR slots. Check yoru
>     syslogs and see if there's anything about killing java processes.
>
>
>     On Wed, Jun 25, 2014 at 3:05 PM, Jacob Rust <jrust@clearedgeit.com
>     <mailto:jrust@clearedgeit.com>> wrote:
>
>         I will play around with the memory settings some more, it sounds
>         like that is definitely it. Thanks everyone!
>
>
>         On Wed, Jun 25, 2014 at 2:55 PM, Josh Elser
>         <josh.elser@gmail.com <mailto:josh.elser@gmail.com>> wrote:
>
>             The lack of exception in the debug log makes it seem even
>             more likely that you just got an OOME.
>
>             It's a crap-shoot as to whether or not you'll actually get
>             the Exception printed in the log, but you should always get
>             it in the .out/.err files as previously mentioned.
>
>
>             On 6/25/14, 2:44 PM, Jacob Rust wrote:
>
>                 Ah, here is the right log: http://pastebin.com/DLEzLGqN
>
>                 I will double check which example. Thanks.
>
>
>                 On Wed, Jun 25, 2014 at 2:38 PM, John Vines
>                 <vines@apache.org <mailto:vines@apache.org>
>                 <mailto:vines@apache.org <mailto:vines@apache.org>>> wrote:
>
>                      And you're certain your using the standalone
>                 example and not the
>                      native-standalone? Those expect the native
>                 libraries to be extant
>                      and if not will eventually cause an OOM.
>
>
>                      On Wed, Jun 25, 2014 at 2:33 PM, Jacob Rust
>                 <jrust@clearedgeit.com <mailto:jrust@clearedgeit.com>
>                      <mailto:jrust@clearedgeit.com
>                 <mailto:jrust@clearedgeit.com>>__> wrote:
>
>                          Accumulo version   1.5.1.2.1.2.1-471
>                          Hadoop version 2.4.0.2.1.2.1-471
>                 <tel:2.4.0.2.1.2.1-471> <tel:2.4.0.2.1.2.1-471
>                 <tel:2.4.0.2.1.2.1-471>>
>
>                          tserver debug log http://pastebin.com/BHdTkxeK
>
>                          I what you mean about the memory. I am using
>                 the memory settings
>                          from the example files
>                 https://github.com/apache/__accumulo/tree/master/conf/__examples/512MB/standalone
>                 <https://github.com/apache/accumulo/tree/master/conf/examples/512MB/standalone>.
>                          I also ran into this problem using the 1GB
>                 example memory
>                          settings. Each node has 4GB RAM.
>
>                          Thanks
>
>
>                          On Wed, Jun 25, 2014 at 2:10 PM, Sean Busbey
>                          <busbey@cloudera.com
>                 <mailto:busbey@cloudera.com> <mailto:busbey@cloudera.com
>                 <mailto:busbey@cloudera.com>>> wrote:
>
>                              What version of Accumulo?
>
>                              What version of Hadoop?
>
>                              What does your server memory and per-role
>                 allocation look like?
>
>                              Can you paste the tserver debug log?
>
>
>
>                              On Wed, Jun 25, 2014 at 1:01 PM, Jacob Rust
>                              <jrust@clearedgeit.com
>                 <mailto:jrust@clearedgeit.com>
>                 <mailto:jrust@clearedgeit.com
>                 <mailto:jrust@clearedgeit.com>>__> wrote:
>
>                                  I am trying to create an inverted text
>                 index for a table
>                                  using accumulo input/output format in a
>                 java
>                                  mapreduce program.  When the job
>                 reaches the reduce
>                                  phase and creates the table / tries to
>                 write to it the
>                                  tablet servers begin to die.
>
>                                  Now when I do a start-all.sh the tablet
>                 servers start
>                                  for about a minute and then die again.
>                 Any idea as to
>                                  why the mapreduce job is killing the
>                 tablet servers
>                                  and/or how to bring the tablet servers
>                 back up without
>                                  failing?
>
>                                  This is on a 12 node cluster with low
>                 quality hardware.
>                                  The java code I am running is here
>                 http://pastebin.com/ti7Qz19m
>
>                                  The log files on each tablet server
>                 only display the
>                                  startup information, no errors. The log
>                 files on the
>                                  master server show these errors
>                 http://pastebin.com/LymiTfB7
>
>
>
>
>                                  --
>                                  Jacob Rust
>                                  Software Intern
>
>
>
>
>                              --
>                              Sean
>
>
>
>
>                          --
>                          Jacob Rust
>                          Software Intern
>
>
>
>
>
>                 --
>                 Jacob Rust
>                 Software Intern
>
>
>
>
>         --
>         Jacob Rust
>         Software Intern
>
>
>
>
>
> --
> Sean

Mime
View raw message