From Eric Baldeschwieler <eri...@hortonworks.com>
Subject Re: [Vote] Merge branch-trunk-win to trunk
Date Fri, 01 Mar 2013 04:47:18 GMT
+1 (non-binding)

A few of observations:

- Windows has actually been a supported platform for Hadoop since 0.1 .  Doug championed supporting
windows then and we've continued to do it with varying vigor over time.  To my knowledge we've
never made a decision to drop windows support.  The change here is improving our support and
dropping the requirement of cigwin.  We had Nutch windows users on the list in 2006 and we've
been supporting windows FS requirements since inception.

- A little pragmatism will go a long way.  As a community we've got to stay committed to keeping
hadoop simple (so it does work on many platforms) and extending it to take advantage of key
emerging OS/hardware features, such as containers, new FSs, virtualization, flash ...  We
should all plan to let new features & optimizations emerge that don't work everywhere,
if they are compelling and central to hadoop's mission of being THE best fabric for storing
and processing big data.  

- A UI project like KDE has to deal with the MANY differences between windows and linux UI
APIs.  Hadoop faces no such complex challenge and hence can be maintained from a single codeline
IMO.  It is mostly abstracted from the OS APIs via Java and our design choices.  Where it
is not we can continue to add plugable abstractions.

On Feb 28, 2013, at 6:01 PM, Matt Foley <mattf@apache.org> wrote:

> +1 (binding)
> Apache is supposed to be about the community.  We have here a community of
> developers, who have actively and openly worked to add a major improvement
> to Hadoop: the ability to work cross-platform.  Furthermore, the size of
> the substantive part of the needed patch is only about 1500 lines, much
> smaller than quite a few other additions to Hadoop over the last few
> months.  We should welcome and support this change, and make sure that the
> code stays cross-platform going forward by extending our CI practices,
> especially pre-commit "test-patch", to also include Windows.
> As most of you know, my colleague Giri Kesavan (PMC member) helps maintain
> the Linux CI capability for Hadoop.  I've talked with him, and he and I are
> committing to getting test-patch implemented for Windows, so that along
> with the current automated "+1"s required to commit, we can add two more,
> for javac build in Windows and core unit tests in Windows.
> Members of the team implementing cross-platform compatibility, including
> Microsoft employees, have opened the discussion for providing hardware or
> VM resources to perform this additional CI testing.  I will assist them to
> work with the Apache Infra team and figure out how to make it happen.
> I understand there is some concern about the additional platform test.
> My going-in
> presumption, based on Java's intrinsic, pretty-good, cross-platform
> compatibility, is that patches to Hadoop will by default also have
> cross-platform compatibility, unless they are written in an explicitly
> platform-dependent way.  I also believe that in the vast majority of cases
> the cross-platform compatibility of Java will carry thru to Hadoop patches,
> without additional effort on the developer's part.
> Let's try it, and see what happens.  If we actually find a frequent
> difficulty, we'll change to engineer around it.  But I believe that, in the
> rare cases where a Windows-specific failure occurs, there will be a number
> of people (new, enthusiastic members of the community! :-) willing to help.
> If such help is not forthcoming, then we can discuss work-arounds, but
> like a previous poster, I am confident in the community.
> Regards,
> --Matt
> On Thu, Feb 28, 2013 at 12:21 PM, Chuan Liu <chuanliu@microsoft.com> wrote:
>> +1 (non-binding)
>>> As someone also contributed to porting Hadoop to Windows, I think Java
>>> already provided a very good platform independent platform.
>>> For features that are not available in Java, we will try to provide our
>>> platform independent APIs that abstract OS tasks away.
>>> Most features should have no difficulty running on Windows and Linux by
>>> using Java and those platform independent APIs.
>>> For concerns raise on new features that may fail on Windows, I think we
>>> don't need to require passing on Windows a mandate at the moment. We can
>>> simply mark it unavailable to Windows and port it later if the feature is
>>> important.
>>> -Chuan
>>> -----Original Message-----
>>> From: Chris Nauroth [mailto:cnauroth@hortonworks.com]
>>> Sent: Thursday, February 28, 2013 11:51 AM
>>> To: hdfs-dev@hadoop.apache.org
>>> Cc: mapreduce-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org;
>>> common-dev@hadoop.apache.org
>>> Subject: Re: [Vote] Merge branch-trunk-win to trunk
>>>> Is there a jira for resolving the outstanding TODOs in the code base
>>>> (similar to HDFS-2148)?  Looks like this merge doesn't introduce many
>>>> which is great (just did a quick diff and grep).
>>> I found 2 remaining TODOs introduced in the current merge patch.  One is
>>> in ContainerLaunch.java.  The container launch script was trying to set a
>>> CLASSPATH that exceeded the Windows maximum command line length.  The fix
>>> was to wrap the long classpath into an intermediate jar containing only a
>>> manifest file with a Class-Path entry.  (See YARN-316.)  Just to be
>>> conservative, we wrapped this logic in an if (Shell.WINDOWS) guard and
>>> marked a TODO to remove it later and use that approach on all platforms
>>> after additional testing.  I've tested this code path successfully on Mac
>>> too, but several people wanted additional testing and performance checks
>>> before removing the if (Shell.WINDOWS) guard.  That work is tracked in an
>>> existing jira: YARN-358.
>>> The other TODO is for winutils to print more usage information and
>>> examples.  At this point, I think winutils is printing sufficient
>>> information, and we can just remove the TODO.  I just submitted a new jira
>>> to start that conversation: HADOOP-9348.
>>> Thank you,
>>> --Chris
>>> On Thu, Feb 28, 2013 at 11:29 AM, Robert Evans <evans@yahoo-inc.com>
>>> wrote:
>>>> My initial question was mostly intended to understand the desired new
>>>> classification of Windows after the merge, and how we plan to maintain
>>>> Windows support.  I am happy to hear that hardware for Jenkins will be
>>>> provided.  I am also fine, at least initially, with us trying to treat
>>>> Windows as a first class supported platform.  But I realize that there
>>>> are a lot of people that do not have easy access to Windows for
>>>> development/debugging, myself included. I also don't want to slow down
>>>> the pace of development too much because of this.  It will cause some
>>>> organizations that do not use or support Windows to be more likely to
>>>> run software that has diverged from an official release.  It also has
>>>> the potential to make the patch submission process even more
>>>> difficult, which increases the likelihood of submitters abandoning
>>>> patches.  However, the great thing about being in a community is we can
>>> change if we need to.
>>>> I am +0 for the merge.  I am not a Windows expert so I don't feel
>>>> comfortable giving it a true +1.
>>>> --Bobby
>>>> On 2/28/13 10:45 AM, "Chris Nauroth" <cnauroth@hortonworks.com> wrote:
>>>>> I'd like to share a few anecdotes about developing cross-platform,
>>>>> hopefully to address some of the concerns about adding overhead to
>>>>> the development process.  By reviewing past cases of cross-platform
>>> Linux vs.
>>>>> Windows bugs, we can get a sense for how the development process
>>>>> could look in the future.
>>>>> HADOOP-9131: TestLocalFileSystem#testListStatusWithColons cannot run
>>>>> on Windows.  As part of an earlier jira, HADOOP-8962, there was a new
>>>>> test committed on trunk covering the case of a local file system
>>>>> interaction on a file containing a ':'.  On Windows, ':' in a path
>>>>> has special meaning as part of the drive specifier (i.e. C:), so this
>>>>> test cannot pass when running on Windows.  In this kind of case, the
>>>>> cross-platform bug is obvious, and the fix is obvious
>>>>> (assumeTrue(!Shell.WINDOWS)).  Ideally, this would get fixed
>>>>> pre-commit after seeing a -1 from the Windows Jenkins slave.
>>>>> HDFS-4274: BlockPoolSliceScanner does not close verification log
>>>>> during shutdown.  This caused problems for MiniDFSCluster-based tests
>>>>> running on Windows.  Failure to close the verification log meant that
>>>>> we didn't release file locks, so the tests couldn't delete/recreate
>>>>> working directories during teardown/setup.  Arguably, this was always
>>>>> a bug, and running on Windows just exposed it because of its stricter
>>>>> rules about file locking.  This is a more complex fix, but it doesn't
>>>>> require platform-specific knowledge.  If some future patch
>>>>> accidentally regresses this, then we'll likely see +1 from Linux
>>>>> Jenkins and -1 from Windows Jenkins.  Ideally, it would get fixed
>>>>> pre-commit, because it doesn't require Windows-specific knowledge.
>>>>> There is also the matter of impact.
>>>>> Re-breaking this would re-break many test suites on Windows.
>>>>> HADOOP-9232: JniBasedUnixGroupsMappingWithFallback fails on Windows
>>>>> with UnsatisfiedLinkError.  This was introduced by HADOOP-8712, which
>>>>> switched to JniBasedUnixGroupsMappingWithFallback as the default
>>>>> hadoop.security.group.mapping, but did not provide a Windows
>>>>> implementation of the JNI function.  In this case, there was a strong
>>>>> desire to get
>>>>> HADOOP-8712 into a release, fixing it on Windows required native
>>>>> Windows API knowledge, and Windows users had a simple workaround
>>>>> available by changing their configs back to
>>>>> ShellBasedUnixGroupsMapping.  I think this is the kind of situation
>>>>> where we could allow HADOOP-8712 to commit despite
>>>>> -1 from Windows Jenkins, with fairly quick follow-up from an engineer
>>>>> with the Windows expertise to fix it.
>>>>> To summarize, I don't think it needs to differ greatly from our
>>>>> current development process.  We're all responsible for breadth of
>>>>> understanding and maintenance of the whole codebase, but we also rely
>>>>> on specific individuals with deep expertise in particular areas for
>>> certain issues.
>>>>> Sometimes we commit despite a -1 from Jenkins, based on the
>>>>> community's judgment.
>>>>> Virtualization greatly simplifies cross-platform development.  I use
>>>>> VirtualBox on a Mac host and run VMs for Windows and Ubuntu with a
>>>>> shared drive so that they can all see the same copy of the source
>>>>> code.  There are plenty of variations on this depending on your
>>>>> preference, such as offloading the VMs to a separate server or cloud
>>>>> service to free up local RAM.  I'm planning on submitting
>>>>> BUILDING.txt changes later today that fully describe how to build on
>>>>> Windows.  After some initial setup, it's nearly identical to the mvn
>>>>> commands that you already use today.
>>>>> Hope this helps,
>>>>> --Chris
>>>>> On Thu, Feb 28, 2013 at 3:25 AM, John Gordon
>>>>> <John.Gordon@microsoft.com>wrote:
>>>>>> +1 (non-binding)
>>>>>> I want to share my vote of confidence in this community.  If
>>>>>> motivated to  do so, this community can keep this project
>>>>>> cross-platform and continue to  rapidly innovate without breaking
>>>>>> sweat.
>>>>>> The day we started working on this, I saw the foundations of
>>>>>> greatness in  the quality and volume of dev tests, the code itself,
>>>>>> and the Apache values  themselves.
>>>>>> 1.) Hadoop's unit tests and their frameworks are very well thought
>>>>>> out and  the consideration and energy that went into their design
>>>>>> worthy of  praise.  The MiniCluster abstractions utilize very few
>>>>>> resources and put  all the processes into one JVM for easy
>>>>>> debugging.  It is very easy to  select specific tests from the full
>>>>>> suite to reproduce an issue reported in  another environment - like
>>>>>> the Jenkins build server or another  contributor's environment.
>>>>>> 2.) This community has done an excellent job of incorporating
>>>>>> well-placed  log messages to make it easy to post mortem
>>>>>> troubleshoot most failures.
>>>>>> The logs are very useful, and it is extremely rare that
>>>>>> troubleshooting a  failure requires debugging a live repro.
>>>>>> 3.) Hadoop is written primarily in Java, a cross-platform language
>>>>>> that  provides its own platform in the form of the JVM to insulate
>>>>>> most of the  code from the specifics of the OS layer.
>>>>>> 4.) CoPDoC - The right priorities, and well stated.
>>>>>> Thank you,
>>>>>> John
>>>>>> -----Original Message-----
>>>>>> From: Ivan Mitic [mailto:ivanmi@microsoft.com]
>>>>>> Sent: Wednesday, February 27, 2013 6:32 PM
>>>>>> To: mapreduce-dev@hadoop.apache.org; common-dev@hadoop.apache.org
>>>>>> Cc: yarn-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org
>>>>>> Subject: RE: [Vote] Merge branch-trunk-win to trunk
>>>>>> +1 (non-binding)
>>>>>> I am really glad to see this happening! As people already
>>>>>> mentioned, this  has been a great engineering effort involving many
>>>>>> people!
>>>>>> Folks raised some valid concerns below and I thought it would be
>>>>>> good to  share my 2 cents. In my opinion, we don't have to solve
>>>>>> these problems  right now. As we move forward with two platforms,
>>>>>> can start addressing  one problem at a time and incrementally
>>>>>> improve. In the first iteration,  maintaining Hadoop on Windows
>>>>>> could be just everyone trying to do their  best effort (make sure
>>>>>> Jenkins build succeeds at least). We already have  people who are
>>>>>> building/running trunk on Windows daily, so they would jump  in and
>>>>>> fix problems as needed (we've been doing this in branch-trunk-win
>>>>>> for a while now). Although I see that the problems could arise with
>>>>>> platform specific features/optimizations, I don't think these are
>>>>>> frequent,  so in most cases everything will just work. Merging the
>>>>>> two branches sooner  rather than later does seems like the right
>>>>>> thing to do if the ultimate  goal is to have Hadoop on both
>>>>>> platforms. Now that the port has completed,  we will have people
>>>>>> Microsoft (and elsewhere) wanting to contribute
>>>>>> features/improvements to the trunk branch. A separate branch would
>>>>>> just  make things more difficult and confusing for everyone :) Hope
>>>>>> this makes  sense.
>>>>>> -----Original Message-----
>>>>>> From: Todd Lipcon [mailto:todd@cloudera.com]
>>>>>> Sent: Wednesday, February 27, 2013 3:43 PM
>>>>>> To: common-dev@hadoop.apache.org
>>>>>> Cc: yarn-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org;
>>>>>> mapreduce-dev@hadoop.apache.org
>>>>>> Subject: Re: [Vote] Merge branch-trunk-win to trunk
>>>>>> On Wed, Feb 27, 2013 at 2:54 PM, Suresh Srinivas <
>>>> suresh@hortonworks.com
>>>>>>> wrote:
>>>>>>> With that we need to decide how our precommit process looks.
>>>>>>> My inclination is to wait for +1 from precommit builds on both
>>>>>>> the platforms to ensure no issues are introduced.
>>>>>>> Thoughts?
>>>>>>> 2. Feature development impact
>>>>>>> Some questions have been raised about would new features need
>>>>>>> be supported on both the platforms. Yes. I do not see a reason
>>>>>>> why features cannot work on both the platforms, with the
>>>>>>> exception of platform specific optimizations. This what Java
>>> us.
>>>>>> I'm concerned about the above. Personally, I don't have access to
>>>>>> any  Windows boxes with development tools, and I know nothing about
>>>>>> developing  on Windows. The only Windows I run is an 8GB VM with
>>>>>> GB RAM allocated,  for powerpoint :)
>>>>>> If I submit a patch and it gets -1 "tests failed" on the Windows
>>>>>> slave, how am I supposed to proceed?
>>>>>> I think a reasonable compromise would be that the tests should
>>>>>> always
>>>>>> *build* on Windows before commit, and contributors should do their
>>>>>> best to  look at the test logs for any Windows-specific failures.
>>>>>> But, beyond  looking at the logs, a "-1 Tests failed on windows"
>>>>>> should not block a  commit.
>>>>>> Those contributors who are interested in Windows being a
>>>>>> first-class platform should be responsible for watching the Windows
>>>>>> builds and debugging/fixing any regressions that might be
>>> Windows-specific.
>>>>>> I also think the KDE model that Harsh pointed out is an interesting
>>>>>> one
>>>>>> --
>>>>>> ie the idea that we would not merge windows support to trunk, but
>>>>>> rather  treat is as a "parallel code line" which lives in the ASF
>>>>>> and has its own  builds and releases. The windows team would
>>>>>> periodically merge
>>>>>> trunk->win
>>>>>> to pick up any new changes, and do a separate test/release process.
>>>>>> I'm not  convinced this is the best idea, but worth discussion of
>>>>>> pros and cons.
>>>>>> -Todd
>>>>>>> On Wed, Feb 27, 2013 at 11:56 AM, Eli Collins <eli@cloudera.com>
>>>>>> wrote:
>>>>>>>> Bobby raises some good questions.  A related one, since most
>>>>>>>> current developers won't add Windows support for new features
>>>>>>>> that are platform specific is it assumed that Windows
>>>>>>>> development will either lag or will people actively work
>>>>>>>> keeping Windows up with the latest?  And vice versa in case
>>>>>>>> Windows support is implemented
>>>>>> first.
>>>>>>>> Is there a jira for resolving the outstanding TODOs in the
>>>>>>>> base (similar to HDFS-2148)?  Looks like this merge doesn't
>>>>>>>> introduce many which is great (just did a quick diff and
>>>>>>>> Thanks,
>>>>>>>> Eli
>>>>>>>> On Wed, Feb 27, 2013 at 8:17 AM, Robert Evans
>>>>>>>> <evans@yahoo-inc.com>
>>>>>>> wrote:
>>>>>>>>> After this is merged in is Windows still going to be
a second
>>>>>>>>> class citizen but happens to work for more than just
>>>>>>>>> development or is it a fully supported platform where
>>>>>>>>> something breaks it can block a
>>>>>>>> release?
>>>>>>>>> How do we as a community intend to keep Windows support
>>>>>> breaking?
>>>>>>>>> We don't have any Jenkins slaves to be able to run nightly
>>>>>>>>> tests to validate everything still compiles/runs.  This
>>>>>>>>> not a blocker for me because we often rely on individuals
>>>>>>>>> groups to test Hadoop, but I
>>>>>>> do
>>>>>>>>> think we need to have this discussion before we put it
>>>>>>>>> --Bobby
>>>>>>>>> On 2/26/13 4:55 PM, "Suresh Srinivas"
>>>>>>>>> <suresh@hortonworks.com>
>>>>>> wrote:
>>>>>>>>>> I had posted heads up about merging branch-trunk-win
to trunk
>>>>>>>>>> on Feb
>>>>>>> 8th.
>>>>>>>>>> I
>>>>>>>>>> am happy to announce that we are ready for the merge.
>>>>>>>>>> Here is a brief recap on the highlights of the work
>>>>>>>>>> - Command-line scripts for the Hadoop surface area
>>>>>>>>>> - Mapping the HDFS permissions model to Windows
>>>>>>>>>> - Abstracted and reconciled mismatches around differences
>>>>>>>>>> Path semantics in Java and Windows
>>>>>>>>>> - Native Task Controller for Windows
>>>>>>>>>> - Implementation of a Block Placement Policy to support
>>>>>>>>>> environments, more specifically Azure.
>>>>>>>>>> - Implementation of Hadoop native libraries for Windows
>>>>>>>>>> (compression codecs, native I/O)
>>>>>>>>>> - Several reliability issues, including race-conditions,
>>>>>>>>>> intermittent
>>>>>>>> test
>>>>>>>>>> failures, resource leaks.
>>>>>>>>>> - Several new unit test cases written for the above
>>>>>>>>>> Please find the details of the work in
>>>>>>>>>> CHANGES.branch-trunk-win.txt - Common
>>>>>>>>>> changes<http://bit.ly/Xe7Ynv>, HDFS changes<
>>>>>>> http://bit.ly/13QOSo9
>>>>>>>>> ,
>>>>>>>>>> and YARN and MapReduce changes <http://bit.ly/128zzMt>.
>>>>>>>>>> is the
>>>>>>> work
>>>>>>>>>> ported from branch-1-win to a branch based on trunk.
>>>>>>>>>> For details of the testing done, please see the thread
>>>>>>>>>> http://bit.ly/WpavJ4. Merge patch for this is available
>>>>>>> HADOOP-8562<
>>>>>>>>>> https://issues.apache.org/jira/browse/HADOOP-8562>.
>>>>>>>>>> This was a large undertaking that involved developing
>>>>>>>>>> testing the entire Hadoop stack, including scale
tests. This
>>>>>>>>>> is made possible only with the contribution from
many many
>>>>>>>>>> folks in the community. Following
>>>>>>> people
>>>>>>>>>> contributed to this work: Ivan Mitic, Chuan Liu,
Ramya Sunil,
>>>>>>>>>> Bikas
>>>>>>> Saha,
>>>>>>>>>> Kanna Karanam, John Gordon, Brandon Li, Chris Nauroth,
>>>>>>>>>> Lao,
>>>>>>>> Sumadhur
>>>>>>>>>> Reddy Bolli, Arpit Agarwal, Ahmed El Baz, Mike Liddell,
>>>>>>>>>> Zhao,
>>>>>>> Thejas
>>>>>>>>>> Nair, Steve Maine, Ganeshan Iyer, Raja Aluri, Giridharan
>>>>>>>>>> Kesavan, Ramya Bharathi Nimmagadda, Daryn Sharp,
Arun Murthy,
>>>>>>>>>> Tsz-Wo Nicholas Sze,
>>>>>>>> Suresh
>>>>>>>>>> Srinivas and Sanjay Radia. There are many others
>>>>>>>>>> contributed as
>>>>>>> well
>>>>>>>>>> providing feedback and comments on numerous jiras.
>>>>>>>>>> The vote will run for seven days and will end on
March 5,
>>>>>>>>>> 6:00PM
>>>>>> PST.
>>>>>>>>>> Regards,
>>>>>>>>>> Suresh
>>>>>>>>>> On Thu, Feb 7, 2013 at 6:41 PM, Mahadevan Venkatraman
>>>>>>>>>> <mahadv@microsoft.com>wrote:
>>>>>>>>>>> It is super exciting to look at the prospect
of these
>>>>>>>>>>> changes being merged  to trunk. Having Windows
as one of the
>>>>>>>>>>> supported Hadoop platforms is
>>>>>>> a
>>>>>>>>>>> fantastic opportunity both for the Hadoop project
>>>>>>>>>>> Microsoft customers.
>>>>>>>>>>> This work began around a year back when a few
of us started
>>>>>>>>>>> with a
>>>>>>>> basic
>>>>>>>>>>> port of Hadoop on Windows. Ever since, the Hadoop
team in
>>>>>>>>>>> Microsoft
>>>>>>>> have
>>>>>>>>>>> made significant progress in the following areas:
>>>>>>>>>>> (PS: Some of these items are already included
in Suresh's
>>>>>>>>>>> email, but including again for completeness)
>>>>>>>>>>> - Command-line scripts for the Hadoop surface
>>>>>>>>>>> - Mapping the HDFS permissions model to Windows
>>>>>>>>>>> - Abstracted and reconciled mismatches around
>>>>>>>>>>> in Path semantics in Java and Windows
>>>>>>>>>>> - Native Task Controller for Windows
>>>>>>>>>>> - Implementation of a Block Placement Policy
to support
>>>>>>>>>>> cloud environments, more specifically Azure.
>>>>>>>>>>> - Implementation of Hadoop native libraries for
>>>>>>>>>>> (compression codecs, native I/O) - Several reliability
>>>>>>>>>>> issues, including race-conditions, intermittent
>>>>>>>>>>> failures, resource
>>>>>> leaks.
>>>>>>>>>>> - Several new unit test cases written for the
above changes
>>>>>>>>>>> In the process, we have closely engaged with
the Apache
>>>>>>>>>>> open source community and have got great support
>>>>>>>>>>> assistance from the
>>>>>>> community
>>>>>>>>>>> in
>>>>>>>>>>> terms of contributing fixes, code review comments
>>> commits.
>>>>>>>>>>> In addition, the Hadoop team at Microsoft has
also made
>>>>>>>>>>> good progress
>>>>>>>> in
>>>>>>>>>>> other projects including Hive, Pig, Sqoop, Oozie,
HCat and
>>>>>> HBase.
>>>>>>> Many
>>>>>>>>>>> of
>>>>>>>>>>> these changes have already been committed to
the respective
>>>>>>>>>>> trunks
>>>>>>> with
>>>>>>>>>>> help from various committers and contributors.
It is great
>>>>>>>>>>> to see the commitment of the community to support
>>>>>>>>>>> platforms, and we
>>>>>>> look
>>>>>>>>>>> forward to the day when a developer/customer
is able to
>>>>>>>>>>> successfully deploy  a complete solution stack
based on
>>>>>>>>>>> Apache Hadoop releases.
>>>>>>>>>>> Next Steps:
>>>>>>>>>>> All of the above changes are part of the Windows
>>>>>>>>>>> HDInsight and  HDInsight Server products from
Microsoft. We
>>>>>>>>>>> have successfully on-boarded  several internal
customers and
>>>>>>>>>>> have been running production workloads
>>>>>>>> on
>>>>>>>>>>> Windows Azure HDInsight. Our vision is to create
a big data
>>>>>>>>>>> platform based  on Hadoop, and we are committed
to helping
>>>>>>>>>>> make Hadoop a world-class  solution that anyone
can use to
>>>>>>>>>>> solve their biggest data challenges.
>>>>>>>>>>> As an immediate next step, we would like to have
>>>>>>>>>>> discussion around
>>>>>>>> how
>>>>>>>>>>> we can ensure that the quality of the mainline
>>>>>>>>>>> branches on Windows  is maintained. To this end,
we would
>>>>>>>>>>> like to get to the state where
>>>>>>> we
>>>>>>>>>>> have
>>>>>>>>>>> pre-checkin validation gates and nightly test
runs enabled
>>>>>>>>>>> on
>>>>>>> Windows.
>>>>>>>>>>> If
>>>>>>>>>>> you have any suggestions around this, please
do send an
>>> email.
>>>>>>>>>>> We
>>>>>>> are
>>>>>>>>>>> committed to helping sustain the long-term quality
>>>>>>>>>>> Hadoop on both Linux  and Windows.
>>>>>>>>>>> We sincerely thank the community for their contribution
>>>>>>>>>>> support
>>>>>>> so
>>>>>>>>>>> far. And hope to continue having a close engagement
in the
>>>>>> future.
>>>>>>>>>>> -Microsoft HDInsight Team
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: Suresh Srinivas [mailto:suresh@hortonworks.com]
>>>>>>>>>>> Sent: Thursday, February 7, 2013 5:42 PM
>>>>>>>>>>> To: common-dev@hadoop.apache.org;
>>>>>>>>>>> yarn-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org;
>>>>>>>>>>> mapreduce-dev@hadoop.apache.org
>>>>>>>>>>> Subject: Heads up - merge branch-trunk-win to
>>>>>>>>>>> The support for Hadoop on Windows was proposed
>>>>>>>>>>> HADOOP-8079<
>>>>>>>>>>> https://issues.apache.org/jira/browse/HADOOP-8079>
almost a
>>>> year
>>>>>>> ago.
>>>>>>>>>>> The
>>>>>>>>>>> goal was to make Hadoop natively integrated,
>>>>>>>>>>> and performance  and scalability tuned on Windows
Server or
>>>>>>>>>>> Windows Azure.
>>>>>>>>>>> We are happy to announce that a lot of progress
has been
>>>>>>>>>>> made in this  regard.
>>>>>>>>>>> Initial work started in a feature branch, branch-1-win,
>>>>>>>>>>> based on branch-1.
>>>>>>>>>>> The details related to the work done in the branch
can be
>>>>>>>>>>> seen in  CHANGES.txt<
>>>> http://svn.apache.org/viewvc/hadoop/common/branches/branch-1-win/CHA
>>>>>>>> NGES
>>>>>>> .
>>>>>>>>>>> branch-1-win.txt?view=markup
>>>>>>>>>>>> .
>>>>>>>>>>> This work has been ported to a branch, branch-trunk-win,
>>>>>>>>>>> based on
>>>>>>>> trunk.
>>>>>>>>>>> Merge patch for this is available on
>>>>>>>>>>> HADOOP-8562<https://issues.apache.org/jira/browse/HADOOP-85
>>>>>>>>>>> 62>
>>>>>>>>>>> .
>>>>>>>>>>> Highlights of the work done so far:
>>>>>>>>>>> 1. Necessary changes in Hadoop to run natively
on Windows.
>>>>>>>>>>> These
>>>>>>>> changes
>>>>>>>>>>> handle differences in platforms related to path
>>>>>>>>>>> process/task  management etc.
>>>>>>>>>>> 2. Addition of winutils tools for managing file
>>>>>>>>>>> and ownership,  user group mapping, hardlinks,
>>>>>>>>>>> links, chmod, disk
>>>>>>> utilization,
>>>>>>>>>>> and
>>>>>>>>>>> process/task management.
>>>>>>>>>>> 3. Added cmd scripts equivalent to existing shell
>>>>>>>>>>> hadoop-daemon.sh, start and stop scripts.
>>>>>>>>>>> 4. Addition of block placement policy implemnation
>>>>>>>>>>> support cloud  enviroment, more specifically
>>>>>>>>>>> We are very close to wrapping up the work in
>>>>>>>>>>> branch-trunk-win and getting  ready for a merge.
>>>>>>>>>>> the merge patch is passing close to 100%
>>>>>>>> of
>>>>>>>>>>> unit tests on Linux. Soon I will call for a vote
to merge
>>>>>>>>>>> this branch into  trunk.
>>>>>>>>>>> Next steps:
>>>>>>>>>>> 1. Call for vote to merge branch-trunk-win to
trunk, when
>>>>>>>>>>> the work completes and precommit build is clean.
>>>>>>>>>>> 2. Start a discussion on adding Jenkins precommit
builds on
>>>>>>>>>>> windows
>>>>>>> and
>>>>>>>>>>> how to integrate that with the existing commit
>>>>>>>>>>> Let me know if you have any questions.
>>>>>>>>>>> Regards,
>>>>>>>>>>> Suresh
>>>>>>>>>> --
>>>>>>>>>> http://hortonworks.com/download/
>>>>>>> --
>>>>>>> http://hortonworks.com/download/
>>>>>> --
>>>>>> Todd Lipcon
>>>>>> Software Engineer, Cloudera

