hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinay Kumar Thota (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1672) Create test scenario for "distributed cache file behaviour, when dfs file is not modified"
Date Mon, 19 Apr 2010 05:50:50 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12858385#action_12858385
] 

Vinay Kumar Thota commented on MAPREDUCE-1672:
----------------------------------------------

+      Assert.assertNotNull("jobInfo is null" + jInfo,
+          jInfo.getStatus().getRunState());

The above statement is used for checking the jInfo instance right,in that case there is not
point of invoking the getStatus() method in the statement.Because even for null instance also
it's accessing the getStatus() and it might be throwing NEP. So change the statement like
below.

Assert.assertNotNull("jobInfo is null",jInfo);



       if (count > 10) {
+          Assert.fail("Since the sleep count has reached beyond a point" +
+            "failing at this point");
+        }


I would suggest here,the error message should be more concrete instead of saying 'count has
reached beyone a point'.I mean,
the message should be "Job has not been started for 10 mins" so test fails.While dubugging
the user will have clear information
about why the test fails and what time it has been waited to start the job.



for (String taskTracker : taskTrackers) {
+          //Formatting tasktracker to get just its FQDN 
+          taskTracker = UtilsForTests.getFQDNofTT(taskTracker);
+          LOG.info("taskTracker is :" + taskTracker);
+
+          //This will be entered from the second job onwards
+          if (countLoop > 1) {
+            if (taskTracker != null) {
+              continueLoop = taskTrackerCollection.contains(taskTracker);
+            }
+            if (!continueLoop) {
+              break;
+            }
+          }
+
+          //Collecting the tasktrackers
+          if (taskTracker != null)  
+            taskTrackerCollection.add(taskTracker);

In the above Instructions I could see many if statements and its pretty hard to untangle.So
you can optimize the code in the below manner.
It's my opinion.

for (String taskTracker : taskTrackers) {
+          //Formatting tasktracker to get just its FQDN 
+          taskTracker = UtilsForTests.getFQDNofTT(taskTracker);
+          LOG.info("taskTracker is :" + taskTracker);
 	   if(taskTrackerCollection.size() == 0) {
 	     taskTrackerCollection.add(taskTracker);
 	     break;
 	   }else{
 	     if(!taskTrackerCollection.contains(taskTracker)){
 	     	taskTrackerCollection.add(taskTracker);
 	     	break;
 	     } 	     
 	   }
} 	   

> Create test scenario for "distributed cache file behaviour, when dfs file is not modified"
> ------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1672
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1672
>             Project: Hadoop Map/Reduce
>          Issue Type: Test
>          Components: test
>            Reporter: Iyappan Srinivasan
>            Assignee: Iyappan Srinivasan
>         Attachments: TEST-org.apache.hadoop.mapred.TestDistributedCacheUnModifiedFile.txt,
TEST-org.apache.hadoop.mapred.TestDistributedCacheUnModifiedFile.txt, TestDistributedCacheUnModifiedFile.patch,
TestDistributedCacheUnModifiedFile.patch, TestDistributedCacheUnModifiedFile.patch, TestDistributedCacheUnModifiedFile.patch,
TestDistributedCacheUnModifiedFile.patch, TestDistributedCacheUnModifiedFile.patch
>
>
> This test scenario is for a distributed cache file behaviour
> when it is not modified before and after being
> accessed by maximum two jobs. Once a job uses a distributed cache file
> that file is stored in the mapred.local.dir. If the next job
> uses the same file, then that is not stored again.
> So, if two jobs choose the same tasktracker for their job execution
> then, the distributed cache file should not be found twice.
> This testcase should run a job with a distributed cache file. All the
> tasks' corresponding tasktracker's handle is got and checked for
> the presence of distributed cache with proper permissions in the
> proper directory. Next when job
> runs again and if any of its tasks hits the same tasktracker, which
> ran one of the task of the previous job, then that
> file should not be uploaded again and task use the old file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message