hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amar Kamat (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-2078) TraceBuilder unable to generate the traces while giving the job history path by globing.
Date Fri, 17 Sep 2010 08:12:34 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910504#action_12910504

Amar Kamat commented on MAPREDUCE-2078:

There is a {{FileSystem.globStatus(Path)}} API in FileSystem to enumerate all the paths represented
by a globbed path. 

The current {{TraceBuilder}} code does the following
  for (int i = 2 + switchTop; i < args.length; ++i) {
    Path thisPath = new Path(args[i]);
    FileSystem fs = thisPath.getFileSystem(conf);
    if (fs.getFileStatus(thisPath).isDirectory()) {
      FileStatus[] statuses = fs.listStatus(thisPath);
      for (FileStatus s : statuses) {
        // process the file 

This needs to changed to first flatten the globbed paths passed as input. So the suggested
fix is 
  for (int i = 2 + switchTop; i < args.length; ++i) { // iterate over the input
    Path thisPath = new Path(args[i]);
    // get the filesystem specific to the input passed
    FileSystem fs = thisPath.getFileSystem(conf);

    // flatten the globbed file path
    FileStatus[] realStatuses = fs.globStatus(thisPath);

    // iterate over all the files under the globbed input path
    for (FileStatus status : realStatuses) {
      // extract the actual (flat) path from the file status
      Path realPath = status.getPath();

      // now do what is done in the trunk 
      if (fs.getFileStatus(realPath).isDirectory()) {
      FileStatus[] statuses = fs.listStatus(realPath);
      for (FileStatus s : statuses) {
        // process the file 

I ran {{TraceBuilder}} with this fix and now it works with globbed input paths.

> TraceBuilder unable to generate the traces while giving the job history path by globing.
> ----------------------------------------------------------------------------------------
>                 Key: MAPREDUCE-2078
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2078
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tools/rumen
>            Reporter: Vinay Kumar Thota
>            Assignee: Amar Kamat
> I was trying to generate the traces for MR job histories by using TraceBuilder. However,
it's unable to generate the traces while giving the job history path by globing. It throws
a file not found exception even though the job history path is exists.
> I have provide the job history path in the below way.
> hdfs://<<clustername>>/dir1/dir2/dir3/*/*/*/*/*/*/
> Exception:
> java.io.FileNotFoundException: File does not exist:
> hdfs://<<clustername>>/dir1/dir2/dir3/*/*/*/*/*/*
>         at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:525)
>         at org.apache.hadoop.tools.rumen.TraceBuilder$MyOptions.<init>(TraceBuilder.java:88)
>         at org.apache.hadoop.tools.rumen.TraceBuilder.run(TraceBuilder.java:183)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>         at org.apache.hadoop.tools.rumen.TraceBuilder.main(TraceBuilder.java:121)
> It's truncating the last  slash in the path.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message