hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Zhang <zjf...@gmail.com>
Subject Re: Configured & PathFilter
Date Tue, 13 Apr 2010 05:41:38 GMT
Kris,

Here's sample PathFilter which can been configured, The only thing you need
to do is add the following line to configure the job

*job.getConfiguration().set("pathfilter.pattern", "your_patter");*

Not sure whether this is what you want.


public class MyPathFilter implements PathFilter ,Configurable{

    private String pattern;

    private Configuration conf;

    public MyPathFilter(){

    }

    @Override
    public boolean accept(Path path) {
        if (path.getName().contains(conf.
get("pathfilter.pattern"))) {
            return true;
        } else {
            return false;
        }
    }

    @Override
    public Configuration getConf() {
        return this.conf;
    }

    @Override
    public void setConf(Configuration conf) {
        this.conf=conf;

    }
}


On Tue, Apr 13, 2010 at 7:05 AM, Kris Nuttycombe
<kris.nuttycombe@gmail.com>wrote:

> Whoops, so much for that idea. The Configuration instance being passed
> to setConf is null.
>
> I am utterly baffled. Is there seriously nobody out there using
> PathFilter in this way? Everyone's just using dumb PathFilter
> instances that don't have any configurable functionality?
>
> /me boggles.
>
> Kris
>
> On Mon, Apr 12, 2010 at 2:03 PM, Kris Nuttycombe
> <kris.nuttycombe@gmail.com> wrote:
> > I just dove into the source, and it looks like the PathFilter instance
> > is instantiated using ReflectionUtils, and setConf is called so if the
> > resulting PathFilter instance implements Configurable, then
> > configuration will be available.
> >
> > Kris
> >
> > On Mon, Apr 12, 2010 at 1:52 PM, Kris Nuttycombe
> > <kris.nuttycombe@gmail.com> wrote:
> >> static void     setInputPathFilter(Job job, Class<? extends PathFilter>
> filter)
> >>
> >> This indicates that reflection will be used to instantiate the
> >> required PathFilter object, and I need to be able to access the
> >> minimum and maximum date for a given run. I don't want to have to
> >> implement a separate PathFilter class for each set of dates,
> >> obviously.
> >>
> >> Thanks,
> >>
> >> Kris
> >>
> >> On Mon, Apr 12, 2010 at 9:35 AM, Jeff Zhang <zjffdu@gmail.com> wrote:
> >>>  Hi Kris,
> >>>
> >>> Do you mean you want to use the PathFilter in map or reduce task ? Or
> you
> >>> mean using the PathFilter in InputFormat ?
> >>> I guess you mean the second case, if so you only need to call
> >>> FileInputFormat.setInputPathFilter(,) to provide the filter
> information.
> >>>
> >>>
> >>> On Mon, Apr 12, 2010 at 8:13 AM, Kris Nuttycombe <
> kris.nuttycombe@gmail.com>
> >>> wrote:
> >>>>
> >>>> Hi, all, quick question about using PathFilter.
> >>>>
> >>>> Is there any way to provide information from the job configuration to
> >>>> a PathFilter instance? In my case, I want to  limit the date range of
> >>>> the files being selected by the filter, and don't want to have to
> >>>> hard-code a separate PathFilter instance for each date range I'm
> >>>> interested in, obviously. If I make my PathFilter extend Configured,
> >>>> will it do the right thing?
> >>>>
> >>>> Thanks!
> >>>>
> >>>> Kris
> >>>
> >>>
> >>>
> >>> --
> >>> Best Regards
> >>>
> >>> Jeff Zhang
> >>>
> >>
> >
>



-- 
Best Regards

Jeff Zhang

Mime
View raw message