hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jason hadoop <jason.had...@gmail.com>
Subject Re: Num map task?
Date Fri, 24 Apr 2009 06:14:34 GMT
Unless the argument (args[0]) to your job is a comma separated set of paths,
you are only adding a single input path. It may be you want to pass args and
not args[0].
 FileInputFormat.setInputPaths(c, args[0]);

On Thu, Apr 23, 2009 at 7:10 PM, nguyenhuynh.mr <nguyenhuynh.mr@gmail.com>wrote:

> Edward J. Yoon wrote:
>
> > As far as I know, FileInputFormat.getSplits() will returns the number
> > of splits automatically computed by the number of files, blocks. BTW,
> > What version of Hadoop/Hbase?
> >
> > I tried to test that code
> > (http://wiki.apache.org/hadoop/Hbase/MapReduce) on my cluster (Hadoop
> > 0.19.1 and Hbase 0.19.0). The number of input paths was 2, map tasks
> > were 274.
> >
> > Below is my changed code for v0.19.0.
> > ---
> >   public JobConf createSubmittableJob(String[] args) {
> >     JobConf c = new JobConf(getConf(), TestImport.class);
> >     c.setJobName(NAME);
> >     FileInputFormat.setInputPaths(c, args[0]);
> >
> >     c.set("input.table", args[1]);
> >     c.setMapperClass(InnerMap.class);
> >     c.setNumReduceTasks(0);
> >     c.setOutputFormat(NullOutputFormat.class);
> >     return c;
> >   }
> >
> >
> >
> > On Thu, Apr 23, 2009 at 6:19 PM, nguyenhuynh.mr
> > <nguyenhuynh.mr@gmail.com> wrote:
> >
> >> Edward J. Yoon wrote:
> >>
> >>
> >>> How do you to add input paths?
> >>>
> >>> On Wed, Apr 22, 2009 at 5:09 PM, nguyenhuynh.mr
> >>> <nguyenhuynh.mr@gmail.com> wrote:
> >>>
> >>>
> >>>> Edward J. Yoon wrote:
> >>>>
> >>>>
> >>>>
> >>>>> Hi,
> >>>>>
> >>>>> In that case, The atomic unit of split is a file. So, you need to
> >>>>> increase the number of files. or Use the TextInputFormat as below.
> >>>>>
> >>>>> jobConf.setInputFormat(TextInputFormat.class);
> >>>>>
> >>>>> On Wed, Apr 22, 2009 at 4:35 PM, nguyenhuynh.mr
> >>>>> <nguyenhuynh.mr@gmail.com> wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>>> Hi all!
> >>>>>>
> >>>>>>
> >>>>>> I have a MR job use to import contents into HBase.
> >>>>>>
> >>>>>> The content is text file in HDFS. I used the maps file to store
> local
> >>>>>> path of contents.
> >>>>>>
> >>>>>> Each content has the map file. ( the map is a text file in HDFS
and
> >>>>>> contain 1 line info).
> >>>>>>
> >>>>>>
> >>>>>> I created the maps directory used to contain map files. And
the
>  this
> >>>>>> maps directory used to input path for job.
> >>>>>>
> >>>>>> When i run job, the number map task is same number map files.
> >>>>>> Ex: I have 5 maps file -> 5 map tasks.
> >>>>>>
> >>>>>> Therefor, the map phase is slowly :(
> >>>>>>
> >>>>>> Why the map phase is slowly if the number map task large and
the
> number
> >>>>>> map task is equal number of files?.
> >>>>>>
> >>>>>> * p/s: Run jobs with: 3 node: 1 server and 2 slaver
> >>>>>>
> >>>>>> Please help me!
> >>>>>> Thanks.
> >>>>>>
> >>>>>> Best,
> >>>>>> Nguyen.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>> Current, I use TextInputformat to set InputFormat for map phase.
> >>>>
> >>>>
> >>>>
> >>>
> >>> Thanks for your help!
> >>>
> >> I use FileInputFormat to add input paths.
> >> Some thing like:
> >>    FileInputFormat.setInputPath(new Path("dir"));
> >>
> >> The "dir" is a directory contains input files.
> >>
> >> Best,
> >> Nguyen
> >>
> >>
> >>
> >>
> Thanks!
>
> I am using Hadoop version 0.18.2
>
> Cheer,
> Nguyen.
>



-- 
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message