hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lei liu <liulei...@gmail.com>
Subject Re: can one map instance handle many data of input paths at the same time
Date Fri, 21 Jan 2011 05:57:33 GMT
Thanks everyone,

I detailed describe my question.  There are two input
direcoties:/user/test1/ and /user/test2/ path, I want to join the two
direcoties content, in order to join the two directories, I need to identity
the content from which directory, so I use below code in mapper:

    private int tag = -1;
    @Override
    public void configure(JobConf conf) {
        try {

            this.conf = conf;
            String pathsToAliasStr = conf.get("paths.to.alias");//example:
conf.set("paths.to.alias", "0=/user/test1/,1=/user/test2/"
            String[] pathsToAlias = pathsToAliasStr.split(",");

            Path fpath = new Path((new
Path(conf.get("map.input.file"))).toUri().getPath());
            String path = fpath.toUri().toString();

            for (int i = 0; i < pathsToAlias.length; i++) {
                String[] pathToAlias = pathsToAlias[i].split("=");
                if (path.startsWith(pathToAlias[1])) {
                    tag = Integer.valueOf(pathToAlias[0].trim());//identity
current map instatnce are handling which directory content.
                }
            }
        } catch (Throwable e) {
            e.printStackTrace();
            throw new RuntimeException(e);
        }

    }

So when map method  run,  the content are handled by the mapper are
identified for same direcoty.

I want to know whether one mapper instatnce only handle content of one
directory at same time.


Thanks

LiuLei






2011/1/21 Eric Sammer <esammer@cloudera.com>

> LiuLei:
>
> Yes. What you're looking for is TextInputFormat.addPath() (assuming you're
> talking about text). You can call this multiple times and add multiple
> input
> paths if they are all of the same data format (i.e. text). If you have
> multiple paths that contain different format data, you'll need to use
> MultipleInputs. See the javadoc for details on usage.
>
> On Thu, Jan 20, 2011 at 1:52 AM, lei liu <liulei412@gmail.com> wrote:
>
> > There are two input paths, example: /user/test1/ and /user/test2/ path.
> >  Can
> > one map instance handle many data of input paths at the same time?
> >
> >
> > Thanks,
> >
> > LiuLei
> >
>
>
>
> --
> Eric Sammer
> twitter: esammer
> data: www.cloudera.com
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message