hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otto Mok <Otto....@acuityads.com>
Subject RE: Which InputFormat to use?
Date Fri, 05 Jul 2013 03:28:07 GMT
A trainer at Hortonworks told me that org.apache.hadoop.mapred is the old package.

So for all intent and purposes use the new one: org.apache.hadoop.mapreduce.

Otto out!

From: Ahmed Eldawy [mailto:aseldawy@gmail.com]
Sent: July-04-13 2:30 PM
To: user@hadoop.apache.org
Subject: Which InputFormat to use?

Hi I'm developing a new set of InputFormats that are used for a project I'm doing. I found
that there are two ways to create  a new InputFormat.
1- Extend the abstract class org.apache.hadoop.mapreduce.InputFormat
2- Implement the interface org.apache.hadoop.mapred.InputFormat
I don't know why there are two versions which are incompatible. I found out that for each
one, there is a whole set of interfaces for different classes such as InputSplit, RecordReader
and MapReduce job. Unfortunately, each set of classes is not compatible with the other one.
This means that I have to choose one of the interfaces and go with it till the end. I have
two questions basically.
1- Which of these two interfaces I should go with? I didn't find any deprecation in one of
them so they both seem legitimate. Is there any plan to retire one of them?
2- I already have some classes implemented in one of the formats, does it worth refactoring
these classes to use the other interface, in case I used he old format.
Thanks in advance for your help.

Best regards,
Ahmed Eldawy
View raw message