hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod K V (JIRA)" <j...@apache.org>
Subject [jira] Updated: (MAPREDUCE-927) Cleanup of task-logs should happen in TaskTracker instead of the Child
Date Mon, 01 Mar 2010 08:26:06 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Vinod K V updated MAPREDUCE-927:
--------------------------------

    Status: Open  (was: Patch Available)

Looked at the patch. Have some comments:

TaskLog.java
 - Creation of the {{localFS}} should not be in a static block. In the past also, we did this
and realized it creates a kind of circular initialization of loggers and results in NPE while
creating log objects which can be seen in task-logs. The current way of creation of {{localFS}}
should be retained.
 - We can move {{DEFAULT_USER_LOG_RETAIN_HOURS}} to {{TaskLogsCleanupThread}}.
 - Shall we rename {{getJobUserLogDir() to a simpler {{getJobDir()}}? And {{getBaseDir(String)}}
to {{getAttemptDir(String)}} to be clear? I think it's ok like this as {{TaskLog.getJobDir()}}
clearly means it is job-dir for logs.
 - Also, I think it's high time {{TaskLog.java}} is made @InterfaceAudience.Private.

TaskLogCleanupThread
 - Rename the class to {{TaskLogsMonitor}}, so that we are consistent going forward with MAPREDUCE-1100.
 - Set the audience visibility of the class to private?
 - {{threadSleepTime}} is not configurable. May not be a public documented configuration,
but still we need one.
 - Constructor: Shouldn't the volume on which the disk-service works be {{getUserLogsDir()}}
instead of {{getBaseLogDir()}}? The correctness is not lost with the current patch as we are
always passing absolute paths to the disk-service, but I think we should change it anyways.
Also can't we simply construct a local-filesystem here itself, instead of calling TaskLog.getLocalFileSystem()?
This will mostly avoid your changes regarding my first comment in TaskLog.java above.
 - For the sake of correctness, in {{removeOldUserLogs()}}, the job should be removed only
after the deletion of the log file.
 - Throughout the class, we should use a clock instead of directly using {{System.currentMillis()}}.
This will better the testing.
 - Shouldn't {{clearOldUserLogs()}} be done as part of constructor itself? This is the pattern
that {{MRAsyncDiskService}} uses, for example.

TaskTracker.java
 - The {{logscleanup}} thread is not joined/killed in the {{TaskTracker.close()}}. So, there
will be zombie threads in the system on a re-init and may well interfere with the new thread.
 - Shouldn't we be setting 770-user-mapred permissions on userlogs/$jobid as part of job-localization?
Granted 711 is enough for now, but this slightly deviates from the current security on the
directories - we always have the most secure permissions possible on the files/dirs. If we
do this, TestLocalizationWithLinuxTaskController should also test the permissions of the joblogdir.
 - The following code fragment at +1035 can be moved out of {{localizeJobFiles()}} into a
new method {{initializeJobLogDir()}} and can be called directly from {{localizeJob()}}.
{code}
  taskLogCleanupThread.removeJobFromLogDeletion(jobId);
  localizer.initializeJobUserLogDir(jobId);
{code}
 - One case is still not handled: TT reinits while the Job is still running. After re-init,
no tasks of the running job arrive at this TT. Retain-hours after re-init TT removes the job's
tasks' logs even though the job is still running elsewhere. Doing this will need TT to specially
communicate with JT and from what I understand, we are not doing it here. If that is the case,
can we simply add a test for this too in {{TestUserLogCleanup}}?

TestUserLogCleanup
 - Use {{TaskTracker.initializeJobLogDir()}} mentioned above in {{setupJobLogDirectory()}}.
 - Once we use a clock in TaskLogCleanupThread,
    -- the tests can be modified to use this.
    -- we can test what exactly the modified retainTimeStamp is after re-init.
 - We should create some attempt dirs also in the joblog dir with appropriate permissions
and verify the proper cleanup.
 - Document(javadoc) the two tests?

test-taskcontroller.c
 - This needs to be fixed to reflect the new directory hierarchy of the logs.

mapred_tutorial.xml
 - Should TaskLogs section be changed to explicitly specify the new directory hierarchy?

Cancelling the patch for the sake of these changes.

> Cleanup of task-logs should happen in TaskTracker instead of the Child
> ----------------------------------------------------------------------
>
>                 Key: MAPREDUCE-927
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-927
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security, tasktracker
>    Affects Versions: 0.21.0
>            Reporter: Vinod K V
>            Assignee: Amareshwari Sriramadasu
>            Priority: Blocker
>             Fix For: 0.21.0
>
>         Attachments: patch-927-1.txt, patch-927.txt
>
>
> Task logs' cleanup is being done in Child now. This is undesirable atleast for two reasons:
1) failures while cleaning up will affect the user's tasks, and 2) the task's wall time will
get affected due to operations that TT actually should own.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message