accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Donald Miner <dmi...@clearedgeit.com>
Subject Re: Mapreduce output format killing tablet servers
Date Wed, 25 Jun 2014 20:22:33 GMT
This is what Jacob is running on:
https://twitter.com/donaldpminer/status/398514283547328512

12x 13" 2011 MacBook Pros.

The poor guy is my summer intern and what we keep telling him is that this
is "building character". Kids these days with their 256GB of RAM!

The plan here is to get something working, not necessarily working well.
Just to test things in a more realistic manner than on a local group of VMs
(although not totally realistic since the hardware is crap). Plus I think
it is cute and it keeps my office warm. We've seen local groups of vms on a
workstation outperform this.

-d


On Wed, Jun 25, 2014 at 3:42 PM, Sean Busbey <busbey@cloudera.com> wrote:

> if you only have 4G available, I'm not sure what kind of Hadoop cluster
> you expect to be able to run, let alone Accumulo. ;)
>
> -Sean
>
>
> On Wed, Jun 25, 2014 at 2:34 PM, Josh Elser <josh.elser@gmail.com> wrote:
>
>> If you only have 4G available, >=2G is probably a little excessive for
>> the OS :)
>>
>>
>> On 6/25/14, 3:30 PM, Sean Busbey wrote:
>>
>>> you can also calculate how much memory you need to have (or your cluster
>>> management software can do it for you).
>>>
>>> Things to factor:
>>>
>>> OS needs (>= 2GB)
>>> DataNode
>>> TaskTracker (or NodeManager depending on MRv1 vs YARN)
>>> task memory (child slots * per-child max under MRv1)
>>> TServer Java Heap
>>> TServer native map
>>>
>>> Plus any other processes you regularly run on those nodes.
>>>
>>>
>>> On Wed, Jun 25, 2014 at 2:07 PM, John Vines <vines@apache.org
>>> <mailto:vines@apache.org>> wrote:
>>>
>>>     It's also possible that you're overscribing your memory on the
>>>     overall system between the tservers and the MR slots. Check yoru
>>>     syslogs and see if there's anything about killing java processes.
>>>
>>>
>>>     On Wed, Jun 25, 2014 at 3:05 PM, Jacob Rust <jrust@clearedgeit.com
>>>     <mailto:jrust@clearedgeit.com>> wrote:
>>>
>>>         I will play around with the memory settings some more, it sounds
>>>         like that is definitely it. Thanks everyone!
>>>
>>>
>>>         On Wed, Jun 25, 2014 at 2:55 PM, Josh Elser
>>>         <josh.elser@gmail.com <mailto:josh.elser@gmail.com>> wrote:
>>>
>>>             The lack of exception in the debug log makes it seem even
>>>             more likely that you just got an OOME.
>>>
>>>             It's a crap-shoot as to whether or not you'll actually get
>>>             the Exception printed in the log, but you should always get
>>>             it in the .out/.err files as previously mentioned.
>>>
>>>
>>>             On 6/25/14, 2:44 PM, Jacob Rust wrote:
>>>
>>>                 Ah, here is the right log: http://pastebin.com/DLEzLGqN
>>>
>>>                 I will double check which example. Thanks.
>>>
>>>
>>>                 On Wed, Jun 25, 2014 at 2:38 PM, John Vines
>>>                 <vines@apache.org <mailto:vines@apache.org>
>>>                 <mailto:vines@apache.org <mailto:vines@apache.org>>>
>>> wrote:
>>>
>>>                      And you're certain your using the standalone
>>>                 example and not the
>>>                      native-standalone? Those expect the native
>>>                 libraries to be extant
>>>                      and if not will eventually cause an OOM.
>>>
>>>
>>>                      On Wed, Jun 25, 2014 at 2:33 PM, Jacob Rust
>>>                 <jrust@clearedgeit.com <mailto:jrust@clearedgeit.com>
>>>                      <mailto:jrust@clearedgeit.com
>>>
>>>                 <mailto:jrust@clearedgeit.com>>__> wrote:
>>>
>>>                          Accumulo version   1.5.1.2.1.2.1-471
>>>                          Hadoop version 2.4.0.2.1.2.1-471
>>>                 <tel:2.4.0.2.1.2.1-471> <tel:2.4.0.2.1.2.1-471
>>>
>>>                 <tel:2.4.0.2.1.2.1-471>>
>>>
>>>                          tserver debug log http://pastebin.com/BHdTkxeK
>>>
>>>                          I what you mean about the memory. I am using
>>>                 the memory settings
>>>                          from the example files
>>>                 https://github.com/apache/__accumulo/tree/master/conf/__
>>> examples/512MB/standalone
>>>                 <https://github.com/apache/accumulo/tree/master/conf/
>>> examples/512MB/standalone>.
>>>
>>>                          I also ran into this problem using the 1GB
>>>                 example memory
>>>                          settings. Each node has 4GB RAM.
>>>
>>>                          Thanks
>>>
>>>
>>>                          On Wed, Jun 25, 2014 at 2:10 PM, Sean Busbey
>>>                          <busbey@cloudera.com
>>>                 <mailto:busbey@cloudera.com> <mailto:busbey@cloudera.com
>>>
>>>                 <mailto:busbey@cloudera.com>>> wrote:
>>>
>>>                              What version of Accumulo?
>>>
>>>                              What version of Hadoop?
>>>
>>>                              What does your server memory and per-role
>>>                 allocation look like?
>>>
>>>                              Can you paste the tserver debug log?
>>>
>>>
>>>
>>>                              On Wed, Jun 25, 2014 at 1:01 PM, Jacob Rust
>>>                              <jrust@clearedgeit.com
>>>                 <mailto:jrust@clearedgeit.com>
>>>                 <mailto:jrust@clearedgeit.com
>>>
>>>                 <mailto:jrust@clearedgeit.com>>__> wrote:
>>>
>>>                                  I am trying to create an inverted text
>>>                 index for a table
>>>                                  using accumulo input/output format in a
>>>                 java
>>>                                  mapreduce program.  When the job
>>>                 reaches the reduce
>>>                                  phase and creates the table / tries to
>>>                 write to it the
>>>                                  tablet servers begin to die.
>>>
>>>                                  Now when I do a start-all.sh the tablet
>>>                 servers start
>>>                                  for about a minute and then die again.
>>>                 Any idea as to
>>>                                  why the mapreduce job is killing the
>>>                 tablet servers
>>>                                  and/or how to bring the tablet servers
>>>                 back up without
>>>                                  failing?
>>>
>>>                                  This is on a 12 node cluster with low
>>>                 quality hardware.
>>>                                  The java code I am running is here
>>>                 http://pastebin.com/ti7Qz19m
>>>
>>>                                  The log files on each tablet server
>>>                 only display the
>>>                                  startup information, no errors. The log
>>>                 files on the
>>>                                  master server show these errors
>>>                 http://pastebin.com/LymiTfB7
>>>
>>>
>>>
>>>
>>>                                  --
>>>                                  Jacob Rust
>>>                                  Software Intern
>>>
>>>
>>>
>>>
>>>                              --
>>>                              Sean
>>>
>>>
>>>
>>>
>>>                          --
>>>                          Jacob Rust
>>>                          Software Intern
>>>
>>>
>>>
>>>
>>>
>>>                 --
>>>                 Jacob Rust
>>>                 Software Intern
>>>
>>>
>>>
>>>
>>>         --
>>>         Jacob Rust
>>>         Software Intern
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Sean
>>>
>>
>
>
> --
> Sean
>



-- 

Donald Miner
Chief Technology Officer
ClearEdge IT Solutions, LLC
Cell: 443 799 7807
www.clearedgeit.com

Mime
View raw message