hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Zhong (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-2841) Task level native optimization
Date Thu, 26 Jun 2014 00:16:29 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14044206#comment-14044206

Sean Zhong commented on MAPREDUCE-2841:

First, Arun and Todd, thank you both for your honest opinions! You are both respected!

I believe the differences will narrow after we see the same facts, I'd like to state the facts
and clarify some confusions:

1. How many lines of code on earth?

Here is a breakdown for branch https://github.com/intel-hadoop/nativetask/tree/native_output_collector:

java code(*.java)		122 files,	8057 lines
	nativetask	62 files,	4080 lines
	nativetask Unit Test	14 files,	1222 lines
	other platform pig/mahout/hbase/hive	25 files,	477 lines
	scenario test	21 files,	2278 lines,
native code(*.h, *.cc)		128 file,	47048 lines
	nativetask	85 files,	11713 lines
	nativetask Unit Test	33 files,	4911 lines,
	otherPlatform pig/mahout/hbase/hive	2 files,	1083 lines,
	thirdparty gtest lib header files	3 files,	28699 lines
	thirdparty lz4/snappy/cityhash	5 files,	642 lines
(Note: All license header lines in each source file are not counted, blanks and other comments
are counted)

If we measure the LOC in the sense of code complexity, then:	
Third party code like google test header files should not be countedīŧŒgtest head alone has
28699 lines of code.
Pig/mahout/hbase/hive code will be removed from the code repository eventually, and should
not be counted.
Scenario test code may not be included, as you can always write new scenario tests.

So after the deduction, effective code contains, 	
NativeTask Source Code(java + native C++): 15793 lines
NativeTask Unit test Code(java + native C++): 6133 lines

2. Is this patch used as alternate implementation of MapReduce runtime, like TEZ?

No, the whole purpose of this patch submission is to act as an Map Output Collector, which
transparently improve MapReduce performance, NOT as a new MR engine.

The code is posted at branch https://github.com/intel-hadoop/nativetask/tree/native_output_collector,
it only includes code for map output collector.

3. Why there are Pig/Mahout/HBase/Hive code in native task source code?

We are working on removing platform(Hive/Pig/HBase/Mahout) code from native task source code
a I commented above, and provide them as standalone jars.
We rushed to post the link without fully cleanup so that we can get some early feedback from

4. Is the Full native runtime included?

No, full native runtime is not included in this patch, and related code is stripped. Repo
https://github.com/intel-hadoop/nativetask/tree/native_output_collector only contains code
for transparent collector.

5. Are there intention to contribute the full native runtime node to Hadoop? or act as a separate

It is not the purpose of this patch to support full native runtime mode, the goal of this
patch is to make existing MR job runs better on modern CPU with native map output collector.

Full native runtime mode is another topic, there is a long way for that to be ready for submission,
We don't want to consider that now.

6. Are there interface compatibility issue?

This patch is not about full native runtime mode which supports native mapper and native reducer.

This patch is only about a custom map output collector in transparent mode. We are using existing
java interfaces, and people are still running their java version mapper/reducer without re-compilation.
User can make a small configure change to enable this nativetask collector. When there is
a case that nativetask don't support, it will simply fallback to default MR implementation.

7. Are there C++ ABI issue?

The concern make sense.

Regarding ABI, if the user don't need custom key comparator, he will never need to implement
native comparator on nativetask.so, so no ABI issue. 
If the user do want to write a native comparator, the nativetask native interface involved
is very limited, only:

typedef int (*ComparatorPtr)(const char * src, uint32_t srcLength, const char * dest,
    uint32_t destLength);
However, the current code will assume user to include whole "NativeTask.h", which contains
more stuff than the typedef above. 
We will work on this to make sure that "NativeTask.h" only expose necessary minimum API. After
we do this, there should be no big ABI issue.

8. How can you make sure the quality of this code?

The code has been actively developed more than 1 year. It has been used and tested in production
for a very long time, and there are also full set of unit test and scenario test for coverage.

9. Can this be worked on TEZ instead?

We believe it is good for MapReduce, we know people are still using Mapreduce, that justify
the reason that we submit the patch.

Whether TEZ can benefit from this feature or not should not impact the decision made here,
as they are two projects. 

If the value of this patch is appreciated, we are totally open to ALSO migrate this to TEZ.
We may want your help on it. 

10. How to make sure the code will be well maintained in the future, considering the fact
that the bug is opened so long ago and not maintained well?

If you check commit history at https://github.com/intel-hadoop/nativetask/commits/native_output_collector,
you will find that the code has been actively developed continuously over years, and they
are in public. Till now, there are 319 commits. I personally started to work on this project
since end of 2012.

Now We post it here because we believe after all these commits the code quality has reached
a level that worth contributing back formally to hadoop. The code is stable, and has been
well tested in production environment. We are very serious on this and definitely will take
responsibility to maintain it. Besides, we have Todd's help on this. 

11. Why you still work on MapReduce, as the new investment is not in Mapreduce?

We believe as long as there are still big user base who use MapReduce in daily job, no matter
how slow MR new feature evolves or where the new investment goes, there is still value for
us to contribute to it. Intel has a history to support many decades old projects. 

We are also open to apply the "nativetask" ideas to other projects, but it is not directly
related with this jira.
12. How to support encrypted shuffle in nativetask?

First, there are a flag to enable/disable the native output collector feature, with full control
from end user. 

Second, if there are cases nativetask don't support, nativetask will simply fallback to default
mapreduce implementation.

For encrypted shuffle, we will fallback to default MR implementation. However, we are totally
open to support further cases like this as long as there are still performance benefits, otherwise
we will just fallback.  
13. Why the code is at github https://github.com/intel-hadoop/nativetask/tree/native_output_collector?

This is just a place for early code review and get some early feedbacks. We will submit it
as a standard patch after we clean up hive/pig platforms.
14. Why not put it as incubator project instead of putting this in MR code base?

We think:
  a) It is good for mapreduce, it help mapreduce to run better.
  b) More mapreduce user can use this. It is generally more easy for a user to update hadoop
and use this feature if provided in hadoop package.
  c) The performance benefit for mapreduce is big. We believe low level CPU optimization is
still important, knowing the facts that there are other optimization on computation flow like
TEZ, Spark, Impala and etc..
  d) This map output collector is transparent, it is not unreasoned to provide this as a alternative
collector in hadoop package. there are similar practice before, like the fair scheduler is
bundled in similar way.
  e) Risk is low. Nativetask will fallback to default MR implementation if some case is not
supported or goes wrong. 
  f) Change is small. There are only 10 lines of code change required for hadoop, we expect
no regression possibility. Patch for hadoop is provided at:
 Other nativetask code are self-contained, and won't pollute mapreduce.
    In runtime, we only requires deploying the nativetask jars to hadoop directory when installing.

These thinkings make us tend to feel that it is better to do this in hadoop instead of a new

The above questions was interpreted from the comments in a very subjective manner, I suspect
there will be points missing or wrongly interpreted that need further elaboration or correction.
What are them?
I am sure there are still concerns for this patch, but before we say no to reject, maybe we
should pause and think:
Are there still big user base using Mapreduce? Is this patch valuable for those users? Is
the risk involved by this patch controlled? Can these concerns be solved by improving the

In the hadoop eco-system, there are many high level innovation like TEZ, Impala, Spark and
etc.., but there are far less innovation or optimization in micro-architecture. This patch
is a try. 


> Task level native optimization
> ------------------------------
>                 Key: MAPREDUCE-2841
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: task
>         Environment: x86-64 Linux/Unix
>            Reporter: Binglin Chang
>            Assignee: Sean Zhong
>         Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, dualpivot-0.patch,
dualpivotv20-0.patch, fb-shuffle.patch
> I'm recently working on native optimization for MapTask based on JNI. 
> The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by
mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary
test(on Xeon E5410, jdk6u24) showed promising results:
> 1. Sort is about 3x-10x as fast as java(only binary string compare is supported)
> 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is
used, things can get much faster(1G/
> 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill
> This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper
does nothing) is used
> There are limitations of course, currently only Text and BytesWritable is supported,
and I have not think through many things right now, such as how to support map side combine.
I had some discussion with somebody familiar with hive, it seems that these limitations won't
be much problem for Hive to benefit from those optimizations, at least. Advices or discussions
about improving compatibility are most welcome:) 
> Currently NativeMapOutputCollector has a static method called canEnable(), which checks
if key/value type, comparator type, combiner are all compatible, then MapTask can choose to
enable NativeMapOutputCollector.
> This is only a preliminary test, more work need to be done. I expect better final results,
and I believe similar optimization can be adopt to reduce task and shuffle too. 

This message was sent by Atlassian JIRA

View raw message