pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jarek Jarcec Cecho <jar...@apache.org>
Subject Re: pig 0.11 candidate 2 feedback: Several problems
Date Wed, 20 Feb 2013 21:15:53 GMT
Just a unrelated note: The CDH3 is more closer to Hadoop 1.x than to 0.20.

Jarcec

On Wed, Feb 20, 2013 at 12:04:51PM -0800, Dmitriy Ryaboy wrote:
> I agree -- this is a good release. The bugs Kai pointed out should be
> fixed, but as they are not critical regressions, we can fix them in 0.11.1
> (if someone wants to roll 0.11.1 the minute these fixes are committed, I
> won't mind and will dutifully vote for the release).
> 
> I think the Hadoop 20.2 incompatibility is unfortunate but iirc this is
> fixable by setting HADOOP_USER_CLASSPATH_FIRST=true (was that in 20.2?)
> 
> FWIW Twitter's running CDH3 and this release works in our environment.
> 
> At this point things that block a release are critical regressions in
> performance or correctness.
> 
> D
> 
> 
> On Wed, Feb 20, 2013 at 11:52 AM, Alan Gates <gates@hortonworks.com> wrote:
> 
> > No.  Bugs like these are supposed to be found and fixed after we branch
> > from trunk (which happened several months ago in the case of 0.11).  The
> > point of RCs are to check that it's a good build, licenses are right, etc.
> >  Any bugs found this late in the game have to be seen as failures of
> > earlier testing.
> >
> > Alan.
> >
> > On Feb 20, 2013, at 11:33 AM, Russell Jurney wrote:
> >
> > > Isn't the point of an RC to find and fix bugs like these>
> > >
> > >
> > > On Wed, Feb 20, 2013 at 11:31 AM, Bill Graham <billgraham@gmail.com>
> > wrote:
> > >
> > >> Regarding Pig 11 rc2, I propose we continue with the current vote as is
> > >> (which closes today EOD). Patches for 0.20.2 issues can be rolled into
a
> > >> Pig 0.11.1 release whenever they're available and tested.
> > >>
> > >>
> > >>
> > >> On Wed, Feb 20, 2013 at 9:24 AM, Olga Natkovich <onatkovich@yahoo.com
> > >>> wrote:
> > >>
> > >>> I agree that supporting as much as we can is a good goal. The issue
is
> > >> who
> > >>> is going to be testing against all these versions? We found the issues
> > >>> under discussion because of a customer report, not because we
> > >> consistently
> > >>> test against all versions. Perhaps when we decide which versions to
> > >> support
> > >>> for next release we need also to agree who is going to be testing and
> > >>> maintaining compatibility with a particular version.
> > >>>
> > >>> For instance since Hadoop 23 compatibility is important for us at Yahoo
> > >> we
> > >>> have been maintaining compatibility with this version for 0.9, 0.10
and
> > >>> will do the same for 0.11 and going forward. I think we would need
> > others
> > >>> to step in and claim the versions of their interest.
> > >>>
> > >>> Olga
> > >>>
> > >>>
> > >>> ________________________________
> > >>> From: Kai Londenberg <kai.londenberg@googlemail.com>
> > >>> To: dev@pig.apache.org
> > >>> Sent: Wednesday, February 20, 2013 1:51 AM
> > >>> Subject: Re: pig 0.11 candidate 2 feedback: Several problems
> > >>>
> > >>> Hi,
> > >>>
> > >>> I stronly agree with Jonathan here. If there are good reasons why you
> > >>> can't support an older version of Hadoop any more, that's one thing.
> > >>> But having to change 2 lines of code doesn't really qualify as such
in
> > >>> my point of view ;)
> > >>>
> > >>> At least for me, pig support for 0.20.2 is essential - without it,
I
> > >>> can't use it. If it doesn't support it, I'll have to branch pig and
> > >>> hack it myself, or stop using it.
> > >>>
> > >>> I guess, there are a lot of people still running 0.20.2 Clusters. If
> > >>> you really have lots of data stored on HDFS and a continuously busy
> > >>> cluster, an upgrade is nothing you do "just because".
> > >>>
> > >>>
> > >>> 2013/2/20 Jonathan Coveney <jcoveney@gmail.com>:
> > >>>> I agree that we shouldn't have to support old versions forever.
That
> > >>> said,
> > >>>> I also don't think we should be too blase about supporting older
> > >> versions
> > >>>> where it is not odious to do so. We have a lot of competition in
the
> > >>>> language space and the broader the versions we can support, the
better
> > >>>> (assuming it isn't too odious to do so). In this case, I don't
think
> > it
> > >>>> should be too hard to change ObjectSerializer so that the
> > commons-codec
> > >>>> code used is compatible with both versions...we could just in-line
> > some
> > >>> of
> > >>>> the Base64 code, and comment accordingly.
> > >>>>
> > >>>> That said, we also should be clear about what versions we support,
but
> > >>> 6-12
> > >>>> months seems short. The upgrade cycles on Hadoop are really, really
> > >> long.
> > >>>>
> > >>>>
> > >>>> 2013/2/20 Prashant Kommireddi <prash1784@gmail.com>
> > >>>>
> > >>>>> Agreed, that makes sense. Probably supporting older hadoop
version
> > for
> > >>> a 1
> > >>>>> or 2 pig releases before moving to a newer/stable version?
> > >>>>>
> > >>>>> Having said that, should we use 0.11 period to communicate
the same
> > to
> > >>> the
> > >>>>> community and start moving on 0.12 onwards? I know we are way
past
> > >> 6-12
> > >>>>> months (1-2 release) time frame with 0.20.2, but we also need
to make
> > >>> sure
> > >>>>> users are aware and plan accordingly.
> > >>>>>
> > >>>>> I'd also be interested to hear how other projects (Hive, Oozie)
are
> > >>>>> handling this.
> > >>>>>
> > >>>>> -Prashant
> > >>>>>
> > >>>>> On Tue, Feb 19, 2013 at 3:22 PM, Olga Natkovich <
> > onatkovich@yahoo.com
> > >>>>>> wrote:
> > >>>>>
> > >>>>>> It seems that for each Pig release we need to agree and
clearly
> > >> state
> > >>>>>> which Hadoop versions it will support. I guess the main
question is
> > >>> how
> > >>>>> we
> > >>>>>> decide on this. Perhaps we should say that Pig no longer
supports
> > >>> older
> > >>>>>> Hadoop versions once the newer one is out for at least
6-12 month to
> > >>> make
> > >>>>>> sure it is stable. I don't think we can support old versions
> > >>>>> indefinitely.
> > >>>>>> It is in everybody's interest to keep moving forward.
> > >>>>>>
> > >>>>>> Olga
> > >>>>>>
> > >>>>>>
> > >>>>>> ________________________________
> > >>>>>> From: Prashant Kommireddi <prash1784@gmail.com>
> > >>>>>> To: dev@pig.apache.org
> > >>>>>> Sent: Tuesday, February 19, 2013 10:57 AM
> > >>>>>> Subject: Re: pig 0.11 candidate 2 feedback: Several problems
> > >>>>>>
> > >>>>>> What do you guys feel about the JIRA to do with 0.20.2
compatibility
> > >>>>>> (PIG-3194)? I am interested in discussing the strategy
around
> > >> backward
> > >>>>>> compatibility as this is something that would haunt us
each time we
> > >>> move
> > >>>>> to
> > >>>>>> the next hadoop version. For eg, we might be in a similar
situation
> > >>> while
> > >>>>>> moving to Hadoop 2.0, when some of the stuff might break
for 1.0.
> > >>>>>>
> > >>>>>> I feel it would be good to get this JIRA fix in for 0.11,
as 0.20.2
> > >>> users
> > >>>>>> might be caught unaware. Of course, I must admit there
is selfish
> > >>>>> interest
> > >>>>>> here and it's probably easier for us to have a workaround
on Pig
> > >>> rather
> > >>>>>> than upgrade hadoop in all our production DCs.
> > >>>>>>
> > >>>>>> -Prashant
> > >>>>>>
> > >>>>>>
> > >>>>>> On Tue, Feb 19, 2013 at 9:54 AM, Russell Jurney <
> > >>>>> russell.jurney@gmail.com
> > >>>>>>> wrote:
> > >>>>>>
> > >>>>>>> I think someone should step up and fix the easy ones,
if possible.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On Tue, Feb 19, 2013 at 9:51 AM, Bill Graham <
> > >> billgraham@gmail.com>
> > >>>>>> wrote:
> > >>>>>>>
> > >>>>>>>> Thanks Kai for reporting these.
> > >>>>>>>>
> > >>>>>>>> What do people think about the severity of these
issues w.r.t.
> > >> Pig
> > >>>>> 11?
> > >>>>>> I
> > >>>>>>>> see a few possible options:
> > >>>>>>>>
> > >>>>>>>> 1. We include some or all of these patches in a
new Pig 11 rc.
> > >>> We'd
> > >>>>>> want
> > >>>>>>> to
> > >>>>>>>> make sure that they don't destabilize the current
branch. This
> > >>>>> approach
> > >>>>>>>> makes sense if we think Pig 11 wouldn't be a good
release
> > >> without
> > >>> one
> > >>>>>> or
> > >>>>>>>> more of these included.
> > >>>>>>>>
> > >>>>>>>> 2. We continue with the Pig 11 release without
these, but then
> > >>>>> include
> > >>>>>>> one
> > >>>>>>>> or more in a 0.11.1 release.
> > >>>>>>>>
> > >>>>>>>> 3. We continue with the Pig 11 release without
these, but then
> > >>>>> include
> > >>>>>>> them
> > >>>>>>>> in a 0.12 release.
> > >>>>>>>>
> > >>>>>>>> Jon has a patch for the MAP issue
> > >>>>>>>> (PIG-3144<https://issues.apache.org/jira/browse/PIG-3144>)
> > >>>>>>>> ready, which seems like the most pressing of the
three to me.
> > >>>>>>>>
> > >>>>>>>> thanks,
> > >>>>>>>> Bill
> > >>>>>>>>
> > >>>>>>>> On Mon, Feb 18, 2013 at 2:27 AM, Kai Londenberg
<
> > >>>>>>>> kai.londenberg@googlemail.com> wrote:
> > >>>>>>>>
> > >>>>>>>>> Hi,
> > >>>>>>>>>
> > >>>>>>>>> I just subscribed to the dev mailing list in
order to give you
> > >>> some
> > >>>>>>>>> feedback on pig 0.11 candidate 2.
> > >>>>>>>>>
> > >>>>>>>>> The following three issues are currently present
in 0.11
> > >>> candidate
> > >>>>> 2:
> > >>>>>>>>>
> > >>>>>>>>> https://issues.apache.org/jira/browse/PIG-3144
- 'Erroneous
> > >> map
> > >>>>>> entry
> > >>>>>>>>> alias resolution leading to "Duplicate schema
alias" errors'
> > >>>>>>>>> https://issues.apache.org/jira/browse/PIG-3194
- Changes to
> > >>>>>>>>> ObjectSerializer.java break compatibility with
Hadoop 0.20.2
> > >>>>>>>>> https://issues.apache.org/jira/browse/PIG-3195
- Race
> > >>> Condition in
> > >>>>>>>>> PhysicalOperator leads to ExecException "Error
while trying to
> > >>> get
> > >>>>>>>>> next result in POStream"
> > >>>>>>>>>
> > >>>>>>>>> The last two of these are easily solveable
(see the tickets
> > >> for
> > >>>>>>>>> details on that). The first one is a bit trickier
I think, but
> > >>> at
> > >>>>>>>>> least there is a workaround for it (pass Map
fields through an
> > >>> UDF)
> > >>>>>>>>>
> > >>>>>>>>> In my personal opinion, each of these problems
is pretty
> > >> severe,
> > >>>>> but
> > >>>>>>>>> opinions about the importance of the MAP Datatype
and STREAM
> > >>>>>> Operator,
> > >>>>>>>>> as well as Hadoop 0.20.2 compatibility might
differ.
> > >>>>>>>>>
> > >>>>>>>>> so far ..
> > >>>>>>>>>
> > >>>>>>>>> Kai Londenberg
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> --
> > >>>>>>>> *Note that I'm no longer using my Yahoo! email
address. Please
> > >>> email
> > >>>>> me
> > >>>>>>> at
> > >>>>>>>> billgraham@gmail.com going forward.*
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> --
> > >>>>>>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> > >>>>>>> datasyndrome.com
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>
> > >>
> > >>
> > >>
> > >> --
> > >> *Note that I'm no longer using my Yahoo! email address. Please email me
> > at
> > >> billgraham@gmail.com going forward.*
> > >>
> > >
> > >
> > >
> > > --
> > > Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> > datasyndrome.com
> >
> >

Mime
View raw message