hama-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven van Beelen <smcvbee...@gmail.com>
Subject Re: HAMA jobs failing, with no debug message - 2
Date Wed, 20 Nov 2013 14:21:22 GMT
Thanks for the info, I'll try it out!
To bad there is no 'Sorted Spilling Message Queue' yet ;-)


On Wed, Nov 20, 2013 at 3:09 PM, Edward J. Yoon <edwardyoon@apache.org>wrote:

> > Can I combine the Spilling Queue with the Sorted Message Queue? (e.g.
>
> Work in progress. HAMA-723
>
> > My program has only one super step.
>
> That's why your program consumes large memory. If you call sync()
> periodically, you might be able to avoid huge consumption of memory.
>
> On Wed, Nov 20, 2013 at 10:58 PM, Steven van Beelen
> <smcvbeelen@gmail.com> wrote:
> > Can I combine the Spilling Queue with the Sorted Message Queue? (e.g.
> > conf.set(MessageManager.QUEUE_TYPE_CLASS,
> > "org.apache.hama.bsp.message.queue.SortedMessageQueue");)
> > My implementation inclines the messages to be received sorted, hence the
> > question.
> >
> > My program has only one superstep. It is an implementation of Inverted
> > Indexing which first reads in a Sequence File consisting of <key, value>
> > pairs where the key is a Text object and the value a IntWritable.
> > The program first parses the Texts Objects, stores each separate word and
> > its frequency. After each document, it sends a messages to another peer
> > containing the word, document id and the frequency.
> > If all the documents have been worked through, sync() is called.
> > After that, a list is created for every word, consisting of all the
> > <document_id, frequency> pairs found.
> >
> >
> > On Wed, Nov 20, 2013 at 2:40 PM, Edward J. Yoon <edwardyoon@apache.org
> >wrote:
> >
> >> Why don't you use Spilling Queue? Then, it'll work without no problem.
> >>
> >> >> > Last note: I'm running an Inverted Indexing algorithm with a data
> set
> >> of
> >> >> > approximately 17 GB.
> >>
> >> How many supersteps is needed? If your job is too
> >> communication-intensive, maybe you should consider another approach.
> >>
> >> On Wed, Nov 20, 2013 at 10:14 PM, Steven van Beelen
> >> <smcvbeelen@gmail.com> wrote:
> >> > Hi Edward,
> >> >
> >> > That was the issue I was thinking of first. So, I increased
> >> > bsp.child.java.opts to 8Gb and that of the Groomservers to 4Gb.
> >> > After that, the 84-tasks run worked, but with 60 tasks it fails as
> said
> >> > above.
> >> > Should I give it more memory? I would think that these amounts per
> >> > task/Groomserver should be enough.
> >> >
> >> > Regars, Steven
> >> >
> >> >
> >> >
> >> > On Wed, Nov 20, 2013 at 12:16 PM, Edward J. Yoon <
> edwardyoon@apache.org
> >> >wrote:
> >> >
> >> >> > The only case the program does run, is when I use the maximum
> number
> >> of
> >> >> > machines (i.e. 7 machines, with 12 cores, 128GB ram..). I set
the
> >> maximum
> >> >> > number of tasks to 12 per node, thus 84. But when I force the
> program
> >> to
> >> >> run
> >> >> > with 60 tasks, the "Job Failed" comes up with no additional info.
> >> >>
> >> >> Your case looks like a memory problem. Can you check the memory space
> >> >> during job execution? or try to increase the max heap of BSP child
> >> >> JVM.
> >> >>
> >> >> > the "Job Failed" comes up with no additional info.
> >> >>
> >> >> Sorry for the inconvenience, i'll check it out and see what's wrong.
> >> >>
> >> >> On Wed, Nov 20, 2013 at 6:22 PM, Steven van Beelen <
> >> smcvbeelen@gmail.com>
> >> >> wrote:
> >> >> > I have a very similar problem as Anveshi Charuvaka is mailing
> about.
> >> >> >
> >> >> > What I found additionally when I set task logging to DEBUG mode,
is
> >> that
> >> >> the
> >> >> > DEBUG logs get interrupted at same point and replaced with the
> "INFO
> >> >> > bsp.BSPJobClient: Job failed." message.
> >> >> > My program works in local, distributed and pseudo mode, so that's
> >> >> probably
> >> >> > not the issue.
> >> >> >
> >> >> > The only case the program does run, is when I use the maximum
> number
> >> of
> >> >> > machines (i.e. 7 machines, with 12 cores, 128GB ram..). I set
the
> >> maximum
> >> >> > number of tasks to 12 per node, thus 84. But when I force the
> program
> >> to
> >> >> run
> >> >> > with 60 tasks, the "Job Failed" comes up with no additional info.
> >> >> >
> >> >> > Last note: I'm running an Inverted Indexing algorithm with a data
> set
> >> of
> >> >> > approximately 17 GB.
> >> >> > Could someone help me with this?
> >> >> >
> >> >> > Regards, Steven
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Best Regards, Edward J. Yoon
> >> >> @eddieyoon
> >> >>
> >>
> >>
> >>
> >> --
> >> Best Regards, Edward J. Yoon
> >> @eddieyoon
> >>
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message