hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rottinghuis, Joep" <jrottingh...@ebay.com>
Subject RE: getting started building Mavenized hadoop common
Date Thu, 04 Aug 2011 13:58:57 GMT
Have been resisting the temptation to jump in on this, but cannot help myself now.

Downloading the source from Maven if source cannot be generated sounds like a better approach
to me than comitting the source itself and trying to generate on top of that.
One can even commit a tarball with the sources and expand when the proper setup is not present.

It seems we'd like the following:
1) Developers with full setup can generate source from scratch
2) Developers with partial setup can still see source in their IDE.
3) Keep it easy to prevent generated source from getting checked in.

If we do end up committing source, then at least keep it in a separate directory clearly marked
as such ("something-generated").
That will not only help the human from even trying to modify the source, but also make cleanup
a simpler and cleaner operation.
Mixing typed and generated source into one directory tree (even with .svnignore and .gitignore)
is not a good idea in my experience.

If we produce generated java source files without actually generating them (whether directly
committed or pulled from elsewhere) would we still compile those sources and use them?
In other words, what happens if developers do end up making code changes to the generated
files? Will those changes get used, or get ignored?

In this respect it would be better to have a jar with sources and let the developer browse
through source code that way.



From: Robert Evans [evans@yahoo-inc.com]
Sent: Thursday, August 04, 2011 6:33 AM
To: general@hadoop.apache.org
Subject: Re: getting started building Mavenized hadoop common

Can we make it a separate maven project.  Not a separate tar but something closer to the hadoop-annotations.
 That way if nothing has changed or the developer does not have the tools to rebuild protocol
buffers then maven can download the jar/source from the maven repo.  If the developer does
change it then they can rebuild and install it as needed.

--Bobby Evans

On 8/4/11 6:38 AM, "Steve Loughran" <stevel@apache.org> wrote:

On 03/08/11 02:41, Ted Dunning wrote:
> (the following discusses religious practices ... please don't break into
> flames)
> In the past, the simplest approach I have seen for dealing with this is to
> simply put the generated code under the normal source dir and check it in.
>   This is particularly handy with Thrift since it is common for users of the
> code to not have a working version of the Thrift compiler.  I then have an
> optional profile that does the code generation.  In my cases, I made that
> profile conditional on a thrift compiler being found, but there are other
> reasonable strategies.  I did the code generation by generating into a temp
> dir and then copying the code into the source tree so that if the generation
> failed, no code was changed.
> The nice side effect is that IDE's see the generated code as first class
> code.
> Many consider various aspects of this style to be bad practice.  Some
> condemn checking in generated code as akin to checking in jars.   I kind of
> agree, but lack of thrift or javacc is common enough that it really has to
> be dealt with by checking these in somewhere.  Only if your code generator
> really is ubiquitous is it feasible not to check in generated code.

The problem with this approach is that SVN will often say "it's changed"
when it hasn't. You can do some tricks with Ant using the <copy>
operation and only copy if they really are different, though once the
generator adds a timestamp to the header you are in trouble, and you
have to look at the diffs to see if anything really has changed. I've
had this problem in the past with Hibernate generated stuff.

> Others consider the commingling of generated an "real" code in the same
> directory tree to be a mortal sin.  I agree, but in a lesser form.  I
> strongly condemn the use of a single directory for generated and
> non-generated code, but if all directories avoid such miscegenation, then I
> don't see this as much of a problem.  Most people recognize that a package
> with a name "generated" will contain generated code.

I'd prefer to generate the stuff in the same tree, in a subdir, with
.svnignore set up to never commit the source. That way it's all in the
same tree, but you can't check it in. This keeps the source there even
when you rm -rf build, but keep it out of SCM
View raw message