spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fengdong Yu <fengdo...@everstring.com>
Subject Re: Differences between Spark APIs for Hadoop 1.x and Hadoop 2.x in terms of performance, progress reporting and IO metrics.
Date Wed, 09 Dec 2015 09:20:28 GMT
I don’t think there is performance difference between 1.x API and 2.x API.

but it’s not a big issue for your change, only com.databricks.hadoop.mapreduce.lib.input.XmlInputFormat.java
<https://github.com/databricks/spark-xml/blob/master/src/main/java/com/databricks/hadoop/mapreduce/lib/input/XmlInputFormat.java>
need to change, right?

It’s not a big change to 2.x API. if you agree, I can do, but I cannot promise the time
within one or two weeks because of my daily job.





> On Dec 9, 2015, at 5:01 PM, Hyukjin Kwon <gurwls223@gmail.com> wrote:
> 
> Hi all, 
> 
> I am writing this email to both user-group and dev-group since this is applicable to
both.
> 
> I am now working on Spark XML datasource (https://github.com/databricks/spark-xml <https://github.com/databricks/spark-xml>).
> This uses a InputFormat implementation which I downgraded to Hadoop 1.x for version compatibility.
> 
> However, I found all the internal JSON datasource and others in Databricks use Hadoop
2.x API dealing with TaskAttemptContextImpl by reflecting the method for this because TaskAttemptContext
is a class in Hadoop 1.x and an interface in Hadoop 2.x.
> 
> So, I looked through the codes for some advantages for Hadoop 2.x API but I couldn't.
> I wonder if there are some advantages for using Hadoop 2.x API.
> 
> I understand that it is still preferable to use Hadoop 2.x APIs at least for future differences
but somehow I feel like it might not have to use Hadoop 2.x by reflecting a method.
> 
> I would appreciate that if you leave a comment here https://github.com/databricks/spark-xml/pull/14
<https://github.com/databricks/spark-xml/pull/14> as well as sending back a reply if
there is a good explanation
> 
> Thanks! 


Mime
View raw message