hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amar Kamat <ama...@yahoo-inc.com>
Subject Re: Availability of Job traces or logs
Date Tue, 06 Dec 2011 05:53:45 GMT
Arun,
> I want to test its behaviour under different size of jobs traces(meaning number of jobs
say 5,10,25,50,100) under different
> number of nodes.
> Till now i was using only the test/data given by mumak which has 19 jobs and 1529 node
topology. I don' have many nodes
> with me to run some programs and collect logs and use Rumen to generate traces.
For the varying jobs part, you can run sleep jobs with varying number of map/reduce tasks
and sleep times. For varying the cluster size, you can run multiple task-trackers on the same
node. You can start with 5 tracker per node. Since you will be running sleep jobs, this should
be ok. Make sure Hadoop security is turned off and default controller is used. Intelligently
design your topology script which will club all the trackers on the same node under one rack.

> I want to control the split placements so i need to modify preferred locations for task
attempts in the trace but the trace for
> even 19 jobs is huge. So, I was thinking whether i can get a small, medium and large
number of Job traces with
> corresponding topology trace so that modifying will be easier.
For this, you need to understand how Rumen handles job logs. I have created MAPREDUCE-3508
for adding filtering capabilities to Rumen. You can make use of this feature to modify Rumen
output and play around with splits. You can also make use of this feature to select few jobs
(say 10, 50 etc) from the input trace.

Amar

On 12/4/11 10:19 AM, "ArunKumar" <arunk786@gmail.com> wrote:

Amar,

I am attempting to write a new scheduler for Hadoop and test it using Mumak.

1> I want to test its behaviour under different size of jobs traces(meaning
number of jobs say 5,10,25,50,100) under different number of nodes.
Till now i was using only the test/data given by mumak which has 19 jobs
and 1529 node topology.
I don' have many nodes with me to run some programs and collect logs and
use Rumen to generate traces.

2> I want to control the split placements so i need to modify preferred
locations for task attempts in the trace but the trace for even 19 jobs is
huge. So, I was thinking whether i can get a small, medium and large number
of Job traces with corresponding topology trace so that modifying will be
easier.


Arun


On Sat, Dec 3, 2011 at 1:15 PM, Amar Kamat [via Lucene] <
ml-node+s472066n3556710h89@n3.nabble.com> wrote:

> Arun,
> You can very well run synthetic workloads like large scale sort, wordcount
> etc or more realistic workloads like PigMix (
> https://cwiki.apache.org/confluence/display/PIG/PigMix). On a decent
> enough cluster, these workloads work pretty well. Is there a specific
> reason why you want traces of varied sizes from various organizations?
>
> > How can i make sure that the rumen generates only say 25 jobs,50 jobs or
> so
> Do you want to get 25/50 jobs based on some filtering criterion? I
> recently faced a similar situation where I wanted to extract jobs from a
> Rumen trace based on job ids. I will be happy to share these filtering
> tools.
>
> Amar
>
>
> On 12/1/11 8:48 AM, "ArunKumar" <[hidden email]<http://user/SendEmail.jtp?type=node&node=3556710&i=0>>
> wrote:
>
> Hi guys !
>
> Apart from generating the job traces from RUMEN , can i get logs or job
> traces of varied sizes from some organizations.
>
> How can i make sure that the rumen generates only say 25 jobs,50 jobs or
> so
> ?
>
>
> Thanks,
> Arun
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Availability-of-Job-traces-or-logs-tp3550462p3550462.html
> Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
>
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/Availability-of-Job-traces-or-logs-tp3550462p3556710.html
>  To unsubscribe from Availability of Job traces or logs, click here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3550462&code=YXJ1bms3ODZAZ21haWwuY29tfDM1NTA0NjJ8NzA5NTc4MTY3>
> .
> NAML<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.InstantMailNamespace&breadcrumbs=instant+emails%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>


--
View this message in context: http://lucene.472066.n3.nabble.com/Availability-of-Job-traces-or-logs-tp3550462p3558530.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message