giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Claudio Martella <claudio.marte...@gmail.com>
Subject Re: [VOTE][CHANGED] Release Giraph 1.0 (rc1)
Date Sun, 14 Apr 2013 17:04:00 GMT
In general, my understanding of RC is that we should not add new features
or improvements. I agree that we cannot fix all the open issues for bugs,
but the least we can do is get the issues with a working patch in. In
particular given that we're releasing a 1.0.


On Sun, Apr 14, 2013 at 6:18 PM, Avery Ching <aching@apache.org> wrote:

> Hi Sebastian,
>
> Thanks for the patch.  I'll try to take a look at it.
>
> The only reason I bring the optimizations up is that a lot of folks tend
> to compare PageRank performance.  The optimizations I'm referring to are
> Giraph ones, not algorithmic ones.  We use ints, floats for ids, messages,
> respectively instead longs, doubles (1/2 network traffic) and
> IntNullArrayEdges vertex edges (efficient array backed edges) instead of
> ByteArrayEdges.  You can see https://issues.apache.org/**
> jira/browse/giraph-543 <https://issues.apache.org/jira/browse/giraph-543>for more
details.
>
> Anyway, given that we are going to ship a 1.0.1 release in a few weeks for
> a variety of reasons, should this really hold up the current release?  I
> would prefer to not cut anymore RCs unless things are totally broken (i.e.
> profiles not compiling, major Giraph bugs, etc.).  There are still a lot of
> outstanding issues in JIRA, we can't fix them all for the 1.0 release.
>
> Let me know what you think.
>
> Avery
>
>
> On 4/13/13 10:46 AM, Sebastian Schelter wrote:
>
>> Hi Avery,
>>
>> I found the bug and can I provide a patch today or tomorrow, so
>> hopefully we can include that in the release (to not knowingly ship
>> bugged code). Furthermore I improved the code to protect against
>> rounding errors.
>>
>> I don't really get what you mean with the missing optimization in
>> comparison to the benchmark PageRank implementation.
>>
>> The implementation in o.a.g.examples.PageRankVertex aims to be a robust
>> real-world implementation. As optimization, it dismisses edge weights
>> and reuses objects where possible. Furthermore it is able to handle
>> dangling vertices that are present in almost every real-world network
>> and it automatically detects the number of supersteps to run. With the
>> patch, it should also provide improved numerical stability.
>>
>> If the runtimes doesn't look good enough when compared to the benchmark
>> implementation, this might also be caused by the dataset which has a
>> skewed degree distribution (like most real-world networks). The
>> benchmark uses a uniform degree distribution AFAIK.
>>
>> Best,
>> Sebastian
>>
>> On 13.04.2013 15:46, Avery Ching wrote:
>>
>>> That's great Sebastian.  I would also recommend taking a look at the
>>> PageRankBenchmark for a performance comparison.  It has been a lot of
>>> speed improvements that should be a bunch faster than PageRankVertex.
>>> Even that though, is not totally optimized.  Hopefully we'll be adding a
>>> "how to optimize performance" guide in the near future.  Should we delay
>>> the release or simply just ship a 1.1, say in the next month with this
>>> fix and supporting YARN's 2.0.4?  I'd like to get on a more normal
>>> release cycle rather than once a year =).
>>>
>>> Avery
>>>
>>> On 4/13/13 3:02 AM, Sebastian Schelter wrote:
>>>
>>>> Hi there,
>>>>
>>>> I got some good and bad news, I tested PageRankVertex (not the Benchmark
>>>> but the example implementation o.a.g.examples.PageRankVertex) from trunk
>>>> compiled for Hadoop 1.0 on a cluster of 26 machines with 208 cores.
>>>>
>>>> I used the Webbase2001 dataset [1] which has 115M vertices and more than
>>>> 1B edges and got some awesome running times, average superstep takes 15
>>>> seconds (!!!). Awesome work, I have to say!
>>>>
>>>> Unfortunately, there seems to be an issue with the convergence
>>>> detection, as it didn't get the correct convergence behavior. I'd like
>>>> to have a look into that this week, so we can ship a performant PageRank
>>>> implementation which automatically runs an appropriate number of
>>>> supersteps. Hope this doesn't delay the release too much.
>>>>
>>>> Best,
>>>> Sebastian
>>>>
>>>>
>>>> [1] http://law.di.unimi.it/**webdata/webbase-2001/<http://law.di.unimi.it/webdata/webbase-2001/>
>>>>
>>>>
>>>> On 13.04.2013 07:39, Avery Ching wrote:
>>>>
>>>>> Thanks to the quick feedback from Roman and Lewis, we have cut a new
>>>>> RC1
>>>>> that addresses the following issues.
>>>>>
>>>>> * Got rid of .git repo in tarball
>>>>> * Fixed issue with not compiling without git repo (GIRAPH-628)
>>>>> * Used gnutar in OSX rather than tar to generate the tarball and get
>>>>> rid
>>>>> of warnings
>>>>> * Pushed GIRAPH-627 to support the yarn profile better
>>>>> * Tarball name changed to the final artifact name (giraph-1.0.tar.gz)
>>>>>
>>>>> Release notes:
>>>>> http://people.apache.org/~**aching/giraph-1.0-RC1/RELEASE_**NOTES.html<http://people.apache.org/~aching/giraph-1.0-RC1/RELEASE_NOTES.html>
>>>>>
>>>>> Release artifacts:
>>>>> http://people.apache.org/~**aching/giraph-1.0-RC1/<http://people.apache.org/~aching/giraph-1.0-RC1/>
>>>>>
>>>>> Corresponding git tag:
>>>>> https://git-wip-us.apache.org/**repos/asf?p=giraph.git;a=**
>>>>> shortlog;h=refs/tags/release-**1.0-RC1<https://git-wip-us.apache.org/repos/asf?p=giraph.git;a=shortlog;h=refs/tags/release-1.0-RC1>
>>>>>
>>>>>
>>>>>
>>>>> Signing keys:
>>>>> http://people.apache.org/keys/**group/giraph.asc<http://people.apache.org/keys/group/giraph.asc>
>>>>>
>>>>> The vote runs for 72 hours, until Monday 11pm PST.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Avery
>>>>>
>>>>> Original message below regarding rc0:
>>>>>
>>>>> ------------------------------**-
>>>>>
>>>>> Fellow Giraphers,
>>>>>
>>>>> We have a our first release candidate since graduating from incubation.
>>>>>    This is a source release, primarily due to the different versions
of
>>>>> Hadoop we support with munge (similar to the 0.1 release).  Since 0.1,
>>>>> we've made A TON of progress on overall performance, optimizing memory
>>>>> use, split vertex/edge inputs, easy interoperability with Apache Hive,
>>>>> and a bunch of other areas.  In many ways, this is an almost totally
>>>>> different codebase.  Thanks everyone for your hard work!
>>>>>
>>>>> Apache Giraph has been running in production at Facebook (against
>>>>> Facebook's Corona implementation of Hadoop -
>>>>> https://github.com/facebook/**hadoop-20/tree/master/src/**
>>>>> contrib/corona<https://github.com/facebook/hadoop-20/tree/master/src/contrib/corona>
>>>>> )
>>>>> since around last December.  It has proven to be very scalable,
>>>>> performant, and enables a bunch of new applications.  Based on the
>>>>> drastic improvements and the use of Giraph in production, it seems
>>>>> appropriate to bump up our version to 1.0.
>>>>>
>>>>> While anyone can vote, the ASF requires majority approval from the PMC
>>>>> -- i.e., at least three PMC members must vote affirmatively for
>>>>> release,
>>>>> and there must be more positive than negative votes. Releases may not
>>>>> be
>>>>> vetoed. Before voting +1 PMC members are required to download the
>>>>> signed
>>>>> source code package, compile it as provided, and test the resulting
>>>>> executable on their own platform, along with also verifying that the
>>>>> package meets the requirements of the ASF policy on releases.
>>>>>
>>>>> Please test this against many other Hadoop versions and let us know how
>>>>> this goes!
>>>>>
>>>>> Release notes:
>>>>> http://people.apache.org/~**aching/giraph-1.0-RC0/RELEASE_**NOTES.html<http://people.apache.org/~aching/giraph-1.0-RC0/RELEASE_NOTES.html>
>>>>>
>>>>> Release artifacts:
>>>>> http://people.apache.org/~**aching/giraph-1.0-RC0/<http://people.apache.org/~aching/giraph-1.0-RC0/>
>>>>>
>>>>> Corresponding git tag:
>>>>> https://git-wip-us.apache.org/**repos/asf?p=giraph.git;a=**
>>>>> shortlog;h=refs/tags/release-**1.0-RC0<https://git-wip-us.apache.org/repos/asf?p=giraph.git;a=shortlog;h=refs/tags/release-1.0-RC0>
>>>>>
>>>>>
>>>>>
>>>>> Signing keys:
>>>>> http://people.apache.org/keys/**group/giraph.asc<http://people.apache.org/keys/group/giraph.asc>
>>>>>
>>>>> The vote runs for 72 hours, until Monday 4pm PST.
>>>>>
>>>>> Thanks everyone for your patience with this release!
>>>>>
>>>>> Avery
>>>>>
>>>>
>


-- 
   Claudio Martella
   claudio.martella@gmail.com

Mime
View raw message