hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Rosenstrauch <dar...@darose.net>
Subject Re: Dose one map instance only handle one input path at the same time?
Date Fri, 21 Jan 2011 14:49:38 GMT
Take a look at CombineFileInputFormat.

DR

On 01/21/2011 09:24 AM, lei liu wrote:
> There are two input direcoties:/user/test1/ and /user/test2/ , I want to
> join the two direcoties content, in order to join the two directories, I
> need to identity the content are handled by mapper from which directory, so
> I use below code in mapper:
>
>      private int tag = -1;
>      @Override
>      public void configure(JobConf conf) {
>          try {
>
>              this.conf = conf;
>              String pathsToAliasStr = conf.get("paths.to.alias");//example:
> conf.set("paths.to.alias", "0=/user/test1/,1=/user/test2/"
>              String[] pathsToAlias = pathsToAliasStr.split(",");
>
>              Path fpath = new Path((new Path(conf.get("map.input.file"
> ))).toUri().getPath());
>              String path = fpath.toUri().toString();
>
>              for (int i = 0; i<  pathsToAlias.length; i++) {
>                  String[] pathToAlias = pathsToAlias[i].split("=");
>                  if (path.startsWith(pathToAlias[1])) {
>                      tag = Integer.valueOf(pathToAlias[0].trim());//identity
> current map instatnce are handling which directory content.
>                  }
>              }
>          } catch (Throwable e) {
>              e.printStackTrace();
>              throw new RuntimeException(e);
>          }
>
>      }
>
> So when map method  run, the content are handled by the mapper are
> identified for same direcoty.
>
> I want to know whether one mapper instatnce only handle content of one
> directory at same time.
>
>
> Thanks
>
> LiuLei
>


Mime
View raw message