hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinay Kumar Thota (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-2138) Gridmix tests with different time interval mr traces (1min, 3min and 5min).
Date Thu, 18 Nov 2010 12:12:14 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933403#action_12933403
] 

Vinay Kumar Thota commented on MAPREDUCE-2138:
----------------------------------------------

bq. Since Load v/s Sleep, Submitter v/s RoundRobin v/s Echo user-resolvers, Stress v/s Replay
v/s Serial are almost independent options, we would ideally need test-cases for all possible
permutations of these. To keep things reasonable though, we should at least have LoadJob run
in each of Stress, Replay and Serial modes. For testing the user-resolvers, we can make do
with SleepJob running with each of Submitter, RoundRobin and Echo user-resolvers. Add in a
couple of extra test-cases for traces with different times (1 v/s 3 v/s 5 minutes) and we're
talking of having at least eight different test-cases for a modestly-reasonable test-suite
for GridMix3.

I am covering all the scenarios which you said, but this ticket covers only above mentioned
3 scenarios and rest of the scenarios covered in different jira tickets.

bq. I suggest changing GridmixJobStory, etc. to have names with, for example, Test as a prefix
so that they do not clash with legitimate classes in the org.apache.hadoop.mapred.gridmix
name-space that might be developed in the future.

Agreed and done the changes accordingly.

bq. In GridmixJobStory, jobstories and zombieJobs seem to be the same map but with different
interfaces to the value. Since ZombieJob implements JobStory, can values with the latter interface
not suffice? (Also, technically buildJobStories() can return a null map, so the callers should
guard against this condition.)

Removed the duplicate method in Utils class.

bq. There should be some class-description for GridmixJobStory. Also, GridmixJobVerification
looks like a very awkward class that should perhaps be subsumed as methods elsewhere. Ditto
for GridmixJobStory in fact - why does this class need to exist, especially since UtilsForGridmix.getJobStories()
seems to do the same thing?
Done.
bq. Need better JavaDoc comments for UtilsForGridmix.listGridmixJobIDs(). There is also no
input parameter named jobStatus for the method. In this method, you can also keep the value
of client.getAllJobs() around instead of calling it in each iteration.
done.  

bq. In UtilsForGridmix.listGridmixOriginalJobIDs(), instead of using the job-name to figure
out the original job's id, you can use the appropriate configuration property (MAPREDUCE-2137).
Also, instead of having two separate methods to get the current and original job identifiers
for GridMix3 jobs, you can either have a map or a list of simple objects (POJOs).

I need the job name because I want to exclude the gridmix input data genertor job.

Please check the new patch which address some of your comments.










> Gridmix tests with different time interval mr traces (1min, 3min and 5min).
> ---------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2138
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2138
>             Project: Hadoop Map/Reduce
>          Issue Type: Task
>          Components: test
>            Reporter: Vinay Kumar Thota
>            Assignee: Vinay Kumar Thota
>         Attachments: MAPREDUCE-2138.patch
>
>
> 1. Generate input data based on cluster size and create the synthetic jobs by using the
1 min folded MR trace and
> submit the jobs with below arguments.
> GRIDMIX_JOB_TYPE = LoadJob
> GRIDMIX_USER_RESOLVER = SubmitterUserResolver
> GRIDMIX_SUBMISSION_POLICY = STRESS
> Input Size = 400 MB * No. of nodes in cluster.
> TRACE_FILE = 1 min folded trace.
> Verify each job status and summary(QueueName, UserName, StatTime, FinishTime, maps, reducers
and counters etc) after
> completion of execution.
> 2. Generate input data based on cluster size and create the synthetic jobs by using the
3 min folded MR trace and
> submit the jobs with below arguments.
> GRIDMIX_JOB_TYPE = LoadJob
> GRIDMIX_USER_RESOLVER = RoundRobinUserResolver
> GRIDMIX_SUBMISSION_POLICY = Replay
> Input Size = 200 MB * No. of nodes in cluster.
> TRACE_FILE = 3 min folded trace.
> PROXY_USERS = proxy users file path.
> Verify each job status, submitted user and summary(QueueName, UserName, StatTime, FinishTime,
maps, reducers and
> counters etc) after completion of execution.
> 3. Generate input data based on cluster size and create the synthetic jobs by using the
5 min folded MR trace and
> submit the jobs with below arguments.
> GRIDMIX_JOB_TYPE = SleepJob
> GRIDMIX_USER_RESOLVER = EchoUserResolver
> GRIDMIX_MIN_FILE = 100 MB
> GRIDMIX_SUBMISSION_POLICY = Serial
> Input Size = 300 MB * No. of nodes in cluster.
> TRACE_FILE = 5 min folded trace.
> Verify each job status, file size and summary(QueueName, UserName, StatTime, FinishTime,
maps, reducers and counters
> etc) after completion of execution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message