hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Devaraj k <devara...@huawei.com>
Subject RE: Which InputFormat to use?
Date Fri, 05 Jul 2013 05:23:48 GMT
Hi Ahmed,

                Hadoop 0.20.0 included the new mapred API, these sometimes refer as context
objects. These are designed to make API easier to evolve in future. There are some differences
between new & old API's,

> The new API's favour abstract classes rather than interfaces, since abstract classes
are easy to evolve.
> New API's use context objects like MapContext & ReduceContext to connect the user
> The old API has a special JobConf object for jobconf, in new API Job configuration will
be done using Configuration.

You can find the new API's in org.apache.hadoop.mapreduce.lib.input.* package and its sub
packages, old API's in org.apache.hadoop.mapred.* package its sub packages.

The new API is type-incompatible with the old, we need to rewrite the jobs to make use of
these advantages.

Based on these things you can select which API's to use.

Devaraj k

From: Ahmed Eldawy [mailto:aseldawy@gmail.com]
Sent: 05 July 2013 00:00
To: user@hadoop.apache.org
Subject: Which InputFormat to use?

Hi I'm developing a new set of InputFormats that are used for a project I'm doing. I found
that there are two ways to create  a new InputFormat.
1- Extend the abstract class org.apache.hadoop.mapreduce.InputFormat
2- Implement the interface org.apache.hadoop.mapred.InputFormat
I don't know why there are two versions which are incompatible. I found out that for each
one, there is a whole set of interfaces for different classes such as InputSplit, RecordReader
and MapReduce job. Unfortunately, each set of classes is not compatible with the other one.
This means that I have to choose one of the interfaces and go with it till the end. I have
two questions basically.
1- Which of these two interfaces I should go with? I didn't find any deprecation in one of
them so they both seem legitimate. Is there any plan to retire one of them?
2- I already have some classes implemented in one of the formats, does it worth refactoring
these classes to use the other interface, in case I used he old format.
Thanks in advance for your help.

Best regards,
Ahmed Eldawy
View raw message