giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Claudio Martella <claudio.marte...@gmail.com>
Subject Re: [VOTE][CHANGED] Release Giraph 1.0 (rc1)
Date Mon, 15 Apr 2013 07:22:05 GMT
I don't understand Gianmarco's argument. Do you claim that people use
Giraph only with more vertices than Integer.MAX_VALUE?


On Mon, Apr 15, 2013 at 12:28 AM, Avery Ching <aching@apache.org> wrote:

> I generally agree and can understand that is mostly typically true, but
> many other benchmarks are doing this to show off performance.  Also, if you
> have the FB graph of a billion users, it could theoretically fit into an
> 32-bit integer.
>
> Avery
>
>
> On 4/14/13 2:41 PM, Gianmarco De Francisci Morales wrote:
>
>> Hi,
>>
>> only one quick comment on optimizations and using ints as ids.
>> In my opinion, if you can use an int as an id for your dataset, probably
>> you don't need Giraph for your problem.
>> Just my 2c
>>
>> Cheers,
>>
>> --
>> Gianmarco
>>
>>
>> On Sun, Apr 14, 2013 at 11:26 PM, Sebastian Schelter <ssc@apache.org>
>> wrote:
>>
>>  Thank you, Avery, wish I had found the bug earlier.
>>> Am 14.04.2013 23:25 schrieb "Avery Ching" <aching@apache.org>:
>>>
>>>  Thanks for your input Sebastian.  Given the choice to removing
>>>> PageRankVertex or adding the fix, I've added your fix and will cut RC2 a
>>>> bit later today.  I really hope this is the last RC.
>>>>
>>>> Avery
>>>>
>>>> On 4/14/13 9:34 AM, Sebastian Schelter wrote:
>>>>
>>>>  Hi Avery,
>>>>>
>>>>> I see your concerns. The benchmarking question is difficult, we had
>>>>> very
>>>>> bad experiences with Mahout in that regards. E.g., we once had a
>>>>> M/R-based PageRank implementation in Mahout that uses our integer-based
>>>>> vectors and removed it as we got public complaints that you can't fit
>>>>> the whole web into the range of an integer. Personally, I'd also
>>>>> refrain
>>>>> from using floats instead of doubles for benchmarks, as this simply
>>>>> means you give up on accuracy.
>>>>>
>>>>> Regarding benchmarks, I guess the best thing we could do is publish our
>>>>> own numbers. The current runtimes I've seen are already very good,
>>>>> Giraph beat a very optimized Stratosphere implementation that we did
>>>>> for
>>>>> a recent paper by approx. 25%.
>>>>>
>>>>> To conclude, I do in no way want to hold up the current release. I'm
>>>>> perfectly fine with not including the patch and optimizing the
>>>>> implementation for a 1.0.1 release, but then we should remove the
>>>>> current examples.PageRankVertex from the 1.0 release, as the
>>>>> convergence
>>>>> detection is broken and we should not knowingly ship bugged code.
>>>>>
>>>>> Best,
>>>>> Sebastian
>>>>>
>>>>>
>>>>> On 14.04.2013 18:18, Avery Ching wrote:
>>>>>
>>>>>  Hi Sebastian,
>>>>>>
>>>>>> Thanks for the patch.  I'll try to take a look at it.
>>>>>>
>>>>>> The only reason I bring the optimizations up is that a lot of folks
>>>>>>
>>>>> tend
>>>
>>>> to compare PageRank performance.  The optimizations I'm referring to
>>>>>>
>>>>> are
>>>
>>>> Giraph ones, not algorithmic ones.  We use ints, floats for ids,
>>>>>> messages, respectively instead longs, doubles (1/2 network traffic)
>>>>>> and
>>>>>> IntNullArrayEdges vertex edges (efficient array backed edges) instead
>>>>>>
>>>>> of
>>>
>>>> ByteArrayEdges.  You can see
>>>>>> https://issues.apache.org/****jira/browse/giraph-543<https://issues.apache.org/**jira/browse/giraph-543>
>>>>>> <
>>>>>>
>>>>> https://issues.apache.org/**jira/browse/giraph-543<https://issues.apache.org/jira/browse/giraph-543>>for
>>> more details.
>>>
>>>> Anyway, given that we are going to ship a 1.0.1 release in a few weeks
>>>>>> for a variety of reasons, should this really hold up the current
>>>>>> release?  I would prefer to not cut anymore RCs unless things are
>>>>>> totally broken (i.e. profiles not compiling, major Giraph bugs, etc.).
>>>>>> There are still a lot of outstanding issues in JIRA, we can't fix
them
>>>>>> all for the 1.0 release.
>>>>>>
>>>>>> Let me know what you think.
>>>>>>
>>>>>> Avery
>>>>>>
>>>>>> On 4/13/13 10:46 AM, Sebastian Schelter wrote:
>>>>>>
>>>>>>  Hi Avery,
>>>>>>>
>>>>>>> I found the bug and can I provide a patch today or tomorrow,
so
>>>>>>> hopefully we can include that in the release (to not knowingly
ship
>>>>>>> bugged code). Furthermore I improved the code to protect against
>>>>>>> rounding errors.
>>>>>>>
>>>>>>> I don't really get what you mean with the missing optimization
in
>>>>>>> comparison to the benchmark PageRank implementation.
>>>>>>>
>>>>>>> The implementation in o.a.g.examples.PageRankVertex aims to be
a
>>>>>>>
>>>>>> robust
>>>
>>>>  real-world implementation. As optimization, it dismisses edge weights
>>>>>>> and reuses objects where possible. Furthermore it is able to
handle
>>>>>>> dangling vertices that are present in almost every real-world
network
>>>>>>> and it automatically detects the number of supersteps to run.
With
>>>>>>> the
>>>>>>> patch, it should also provide improved numerical stability.
>>>>>>>
>>>>>>> If the runtimes doesn't look good enough when compared to the
>>>>>>>
>>>>>> benchmark
>>>
>>>>  implementation, this might also be caused by the dataset which has a
>>>>>>> skewed degree distribution (like most real-world networks). The
>>>>>>> benchmark uses a uniform degree distribution AFAIK.
>>>>>>>
>>>>>>> Best,
>>>>>>> Sebastian
>>>>>>>
>>>>>>> On 13.04.2013 15:46, Avery Ching wrote:
>>>>>>>
>>>>>>>  That's great Sebastian.  I would also recommend taking a look
at the
>>>>>>>> PageRankBenchmark for a performance comparison.  It has been
a lot
>>>>>>>> of
>>>>>>>> speed improvements that should be a bunch faster than
>>>>>>>> PageRankVertex.
>>>>>>>> Even that though, is not totally optimized.  Hopefully we'll
be
>>>>>>>>
>>>>>>> adding
>>>
>>>>  a
>>>>>>>> "how to optimize performance" guide in the near future. 
Should we
>>>>>>>> delay
>>>>>>>> the release or simply just ship a 1.1, say in the next month
with
>>>>>>>>
>>>>>>> this
>>>
>>>>  fix and supporting YARN's 2.0.4?  I'd like to get on a more normal
>>>>>>>> release cycle rather than once a year =).
>>>>>>>>
>>>>>>>> Avery
>>>>>>>>
>>>>>>>> On 4/13/13 3:02 AM, Sebastian Schelter wrote:
>>>>>>>>
>>>>>>>>  Hi there,
>>>>>>>>>
>>>>>>>>> I got some good and bad news, I tested PageRankVertex
(not the
>>>>>>>>> Benchmark
>>>>>>>>> but the example implementation o.a.g.examples.PageRankVertex)
from
>>>>>>>>> trunk
>>>>>>>>> compiled for Hadoop 1.0 on a cluster of 26 machines with
208 cores.
>>>>>>>>>
>>>>>>>>> I used the Webbase2001 dataset [1] which has 115M vertices
and more
>>>>>>>>> than
>>>>>>>>> 1B edges and got some awesome running times, average
superstep
>>>>>>>>> takes
>>>>>>>>> 15
>>>>>>>>> seconds (!!!). Awesome work, I have to say!
>>>>>>>>>
>>>>>>>>> Unfortunately, there seems to be an issue with the convergence
>>>>>>>>> detection, as it didn't get the correct convergence behavior.
I'd
>>>>>>>>>
>>>>>>>> like
>>>
>>>>  to have a look into that this week, so we can ship a performant
>>>>>>>>> PageRank
>>>>>>>>> implementation which automatically runs an appropriate
number of
>>>>>>>>> supersteps. Hope this doesn't delay the release too much.
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Sebastian
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> [1] http://law.di.unimi.it/****webdata/webbase-2001/<http://law.di.unimi.it/**webdata/webbase-2001/>
>>>>>>>>> <
>>>>>>>>>
>>>>>>>> http://law.di.unimi.it/**webdata/webbase-2001/<http://law.di.unimi.it/webdata/webbase-2001/>
>>> >
>>>
>>>>
>>>>>>>>> On 13.04.2013 07:39, Avery Ching wrote:
>>>>>>>>>
>>>>>>>>>  Thanks to the quick feedback from Roman and Lewis, we
have cut a
>>>>>>>>>> new RC1
>>>>>>>>>> that addresses the following issues.
>>>>>>>>>>
>>>>>>>>>> * Got rid of .git repo in tarball
>>>>>>>>>> * Fixed issue with not compiling without git repo
(GIRAPH-628)
>>>>>>>>>> * Used gnutar in OSX rather than tar to generate
the tarball and
>>>>>>>>>> get rid
>>>>>>>>>> of warnings
>>>>>>>>>> * Pushed GIRAPH-627 to support the yarn profile better
>>>>>>>>>> * Tarball name changed to the final artifact name
>>>>>>>>>>
>>>>>>>>> (giraph-1.0.tar.gz)
>>>
>>>>  Release notes:
>>>>>>>>>> http://people.apache.org/~****aching/giraph-1.0-RC1/RELEASE_****<http://people.apache.org/~**aching/giraph-1.0-RC1/RELEASE_**>
>>>>>>>>>> NOTES.html<
>>>>>>>>>>
>>>>>>>>> http://people.apache.org/~**aching/giraph-1.0-RC1/RELEASE_**
>>> NOTES.html<http://people.apache.org/~aching/giraph-1.0-RC1/RELEASE_NOTES.html>
>>> >
>>>
>>>>  Release artifacts:
>>>>>>>>>> http://people.apache.org/~****aching/giraph-1.0-RC1/<http://people.apache.org/~**aching/giraph-1.0-RC1/>
>>>>>>>>>> <
>>>>>>>>>>
>>>>>>>>> http://people.apache.org/~**aching/giraph-1.0-RC1/<http://people.apache.org/~aching/giraph-1.0-RC1/>
>>> >
>>>
>>>>  Corresponding git tag:
>>>>>>>>>> https://git-wip-us.apache.org/****repos/asf?p=giraph.git;a=**<https://git-wip-us.apache.org/**repos/asf?p=giraph.git;a=**>
>>>>>>>>>> shortlog;h=refs/tags/release-****1.0-RC1<
>>>>>>>>>>
>>>>>>>>> https://git-wip-us.apache.org/**repos/asf?p=giraph.git;a=**
>>> shortlog;h=refs/tags/release-**1.0-RC1<https://git-wip-us.apache.org/repos/asf?p=giraph.git;a=shortlog;h=refs/tags/release-1.0-RC1>
>>>
>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Signing keys:
>>>>>>>>>> http://people.apache.org/keys/****group/giraph.asc<http://people.apache.org/keys/**group/giraph.asc>
>>>>>>>>>> <
>>>>>>>>>>
>>>>>>>>> http://people.apache.org/keys/**group/giraph.asc<http://people.apache.org/keys/group/giraph.asc>
>>> >
>>>
>>>>  The vote runs for 72 hours, until Monday 11pm PST.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Avery
>>>>>>>>>>
>>>>>>>>>> Original message below regarding rc0:
>>>>>>>>>>
>>>>>>>>>> ------------------------------****-
>>>>>>>>>>
>>>>>>>>>> Fellow Giraphers,
>>>>>>>>>>
>>>>>>>>>> We have a our first release candidate since graduating
from
>>>>>>>>>> incubation.
>>>>>>>>>>      This is a source release, primarily due to the
different
>>>>>>>>>> versions of
>>>>>>>>>> Hadoop we support with munge (similar to the 0.1
release).  Since
>>>>>>>>>> 0.1,
>>>>>>>>>> we've made A TON of progress on overall performance,
optimizing
>>>>>>>>>> memory
>>>>>>>>>> use, split vertex/edge inputs, easy interoperability
with Apache
>>>>>>>>>> Hive,
>>>>>>>>>> and a bunch of other areas.  In many ways, this is
an almost
>>>>>>>>>>
>>>>>>>>> totally
>>>
>>>>  different codebase.  Thanks everyone for your hard work!
>>>>>>>>>>
>>>>>>>>>> Apache Giraph has been running in production at Facebook
(against
>>>>>>>>>> Facebook's Corona implementation of Hadoop -
>>>>>>>>>> https://github.com/facebook/****hadoop-20/tree/master/src/**<https://github.com/facebook/**hadoop-20/tree/master/src/**>
>>>>>>>>>> contrib/corona<
>>>>>>>>>>
>>>>>>>>> https://github.com/facebook/**hadoop-20/tree/master/src/**
>>> contrib/corona<https://github.com/facebook/hadoop-20/tree/master/src/contrib/corona>
>>> >
>>>
>>>>  )
>>>>>>>>>> since around last December.  It has proven to be
very scalable,
>>>>>>>>>> performant, and enables a bunch of new applications.
 Based on the
>>>>>>>>>> drastic improvements and the use of Giraph in production,
it seems
>>>>>>>>>> appropriate to bump up our version to 1.0.
>>>>>>>>>>
>>>>>>>>>> While anyone can vote, the ASF requires majority
approval from the
>>>>>>>>>> PMC
>>>>>>>>>> -- i.e., at least three PMC members must vote affirmatively
for
>>>>>>>>>> release,
>>>>>>>>>> and there must be more positive than negative votes.
Releases may
>>>>>>>>>> not be
>>>>>>>>>> vetoed. Before voting +1 PMC members are required
to download the
>>>>>>>>>> signed
>>>>>>>>>> source code package, compile it as provided, and
test the
>>>>>>>>>> resulting
>>>>>>>>>> executable on their own platform, along with also
verifying that
>>>>>>>>>>
>>>>>>>>> the
>>>
>>>>  package meets the requirements of the ASF policy on releases.
>>>>>>>>>>
>>>>>>>>>> Please test this against many other Hadoop versions
and let us
>>>>>>>>>> know
>>>>>>>>>> how
>>>>>>>>>> this goes!
>>>>>>>>>>
>>>>>>>>>> Release notes:
>>>>>>>>>> http://people.apache.org/~****aching/giraph-1.0-RC0/RELEASE_****<http://people.apache.org/~**aching/giraph-1.0-RC0/RELEASE_**>
>>>>>>>>>> NOTES.html<
>>>>>>>>>>
>>>>>>>>> http://people.apache.org/~**aching/giraph-1.0-RC0/RELEASE_**
>>> NOTES.html<http://people.apache.org/~aching/giraph-1.0-RC0/RELEASE_NOTES.html>
>>> >
>>>
>>>>  Release artifacts:
>>>>>>>>>> http://people.apache.org/~****aching/giraph-1.0-RC0/<http://people.apache.org/~**aching/giraph-1.0-RC0/>
>>>>>>>>>> <
>>>>>>>>>>
>>>>>>>>> http://people.apache.org/~**aching/giraph-1.0-RC0/<http://people.apache.org/~aching/giraph-1.0-RC0/>
>>> >
>>>
>>>>  Corresponding git tag:
>>>>>>>>>> https://git-wip-us.apache.org/****repos/asf?p=giraph.git;a=**<https://git-wip-us.apache.org/**repos/asf?p=giraph.git;a=**>
>>>>>>>>>> shortlog;h=refs/tags/release-****1.0-RC0<
>>>>>>>>>>
>>>>>>>>> https://git-wip-us.apache.org/**repos/asf?p=giraph.git;a=**
>>> shortlog;h=refs/tags/release-**1.0-RC0<https://git-wip-us.apache.org/repos/asf?p=giraph.git;a=shortlog;h=refs/tags/release-1.0-RC0>
>>>
>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Signing keys:
>>>>>>>>>> http://people.apache.org/keys/****group/giraph.asc<http://people.apache.org/keys/**group/giraph.asc>
>>>>>>>>>> <
>>>>>>>>>>
>>>>>>>>> http://people.apache.org/keys/**group/giraph.asc<http://people.apache.org/keys/group/giraph.asc>
>>> >
>>>
>>>>  The vote runs for 72 hours, until Monday 4pm PST.
>>>>>>>>>>
>>>>>>>>>> Thanks everyone for your patience with this release!
>>>>>>>>>>
>>>>>>>>>> Avery
>>>>>>>>>>
>>>>>>>>>>
>


-- 
   Claudio Martella
   claudio.martella@gmail.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message