spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fengdong Yu <>
Subject Re: Differences between Spark APIs for Hadoop 1.x and Hadoop 2.x in terms of performance, progress reporting and IO metrics.
Date Wed, 09 Dec 2015 09:20:28 GMT
I don’t think there is performance difference between 1.x API and 2.x API.

but it’s not a big issue for your change, only
need to change, right?

It’s not a big change to 2.x API. if you agree, I can do, but I cannot promise the time
within one or two weeks because of my daily job.

> On Dec 9, 2015, at 5:01 PM, Hyukjin Kwon <> wrote:
> Hi all, 
> I am writing this email to both user-group and dev-group since this is applicable to
> I am now working on Spark XML datasource ( <>).
> This uses a InputFormat implementation which I downgraded to Hadoop 1.x for version compatibility.
> However, I found all the internal JSON datasource and others in Databricks use Hadoop
2.x API dealing with TaskAttemptContextImpl by reflecting the method for this because TaskAttemptContext
is a class in Hadoop 1.x and an interface in Hadoop 2.x.
> So, I looked through the codes for some advantages for Hadoop 2.x API but I couldn't.
> I wonder if there are some advantages for using Hadoop 2.x API.
> I understand that it is still preferable to use Hadoop 2.x APIs at least for future differences
but somehow I feel like it might not have to use Hadoop 2.x by reflecting a method.
> I would appreciate that if you leave a comment here
<> as well as sending back a reply if
there is a good explanation
> Thanks! 

View raw message