hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Azuryy Yu <azury...@gmail.com>
Subject Re: Which InputFormat to use?
Date Fri, 05 Jul 2013 06:00:32 GMT
Using InputFormat under mapreduce package.  mapred package is very old
package. but generally you can extend from FileInputFormat under
o.a.h.mapreduce package.


On Fri, Jul 5, 2013 at 1:23 PM, Devaraj k <devaraj.k@huawei.com> wrote:

>  Hi Ahmed,****
>
> ** **
>
>                 Hadoop 0.20.0 included the new mapred API, these sometimes
> refer as context objects. These are designed to make API easier to evolve
> in future. There are some differences between new & old API's,****
>
> ** **
>
> > The new API's favour abstract classes rather than interfaces, since
> abstract classes are easy to evolve.****
>
> > New API's use context objects like MapContext & ReduceContext to connect
> the user code. ****
>
> > The old API has a special JobConf object for jobconf, in new API Job
> configuration will be done using Configuration. ****
>
> ** **
>
> You can find the new API's in org.apache.hadoop.mapreduce.lib.input.*
> package and its sub packages, old API's in org.apache.hadoop.mapred.*
> package its sub packages. ****
>
> ** **
>
> The new API is type-incompatible with the old, we need to rewrite the jobs
> to make use of these advantages.****
>
> ** **
>
> Based on these things you can select which API's to use.****
>
> ** **
>
> Thanks****
>
> Devaraj k****
>
> ** **
>
> *From:* Ahmed Eldawy [mailto:aseldawy@gmail.com]
> *Sent:* 05 July 2013 00:00
>
> *To:* user@hadoop.apache.org
> *Subject:* Which InputFormat to use?****
>
>  ** **
>
> Hi I'm developing a new set of InputFormats that are used for a project
> I'm doing. I found that there are two ways to create  a new InputFormat.**
> **
>
> 1- Extend the abstract class org.apache.hadoop.mapreduce.InputFormat****
>
> 2- Implement the interface org.apache.hadoop.mapred.InputFormat****
>
> I don't know why there are two versions which are incompatible. I found
> out that for each one, there is a whole set of interfaces for different
> classes such as InputSplit, RecordReader and MapReduce job. Unfortunately,
> each set of classes is not compatible with the other one. This means that I
> have to choose one of the interfaces and go with it till the end. I have
> two questions basically.****
>
> 1- Which of these two interfaces I should go with? I didn't find any
> deprecation in one of them so they both seem legitimate. Is there any plan
> to retire one of them?****
>
> 2- I already have some classes implemented in one of the formats, does it
> worth refactoring these classes to use the other interface, in case I used
> he old format.****
>
> Thanks in advance for your help.****
>
> ** **
>
>
> ****
>
> Best regards,
> Ahmed Eldawy****
>

Mime
View raw message