hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Jungblut <thomas.jungb...@gmail.com>
Subject Re: Error with fastgen input
Date Thu, 14 Mar 2013 17:47:11 GMT
>
> As you know, we have a problem of lack of team members and contributors.

So we should break down every tasks as small as possible.


Where was this task not broken into pieces?
There are at least two tasks:

- Improve GraphJobRunner memory consumption (HAMA-704, even reviewed on
reviewboard with huge memory savings)
- Implement SpillingQueue / SortedSpillingQueue (HAMA-644, HAMA-723
whatever else)

This is the change we talked about on the dev list and on JIRAs very
extensively and chose a single design we want to implement. This requires a
lot of code change, so I don't see how splitting that smaller (IMHO this is
atomic enough) would be beneficial. And even if you split the stuff, it
would add huge organizational overhead, because we lack of team
members/contributors that can work on those tasks is limited.

I don't know what you mean exactly. But 23 issues are almost examples
> except YARN integration tasks. If you leave here, I have to take cover
> YARN tasks. Should I wait someone? Am I touching core module
> aggressively?


It is not about a skill discussion here, but I wanted to emphasize that you
can very well work on other JIRAs instead of blocking our work on
graph/messaging. And 23 is at least 22 more than the average of the rest of
the team, think about that: would there be issues for newcomers? Yes there
would! But why are you assigning them to yourself when you're not working
actively on them?

YARN is just a single umbrella issue that is "yours", there is work blocked
on maven coding (HAMA-671) and also there is a pending patch review since
20/11/12 (4 months!) from me in HAMA-672, so don't tell me that you work on
that things actively in your "full-time open sourcer" career.

By the way, can you answer about this question - Is it really
> technical conflicts? or emotional conflicts?


If someone is usually emotional about things, it is you. Technically
speaking, should we branch out such (big) refactoring issues to work on our
own, or do you want to brew your own soup on trunk and have us merge all
the stuff together? In case you want to please fork your own playground
Hama and do all the stuff you want, if something emerges successfuly feel
free to slice a patch and emit a JIRA.

So I think we need to cut release as often as possible.


Sorry Edward, but our releases have been a disaster so far. I'm only here
since 0.3.0, but none of it was either scalable, nor good documented and
well tested. I have no problem with taking more time for a product, as I
don't feel the need to deliver half-baked stuff to people who are not using
it anyways nor providing any feedback there (which is sad reality in many
other open source projects as well). So in my opinion we have to iterate on
our own and not with official releases. "It is done, when it's done" is the
usual standard and I don't think deviating from it will give any advantages
besides pissed off users getting Hama not to work like it should.

Also your changes on the wiki recently:

However, if no one responds to your patches for 3 days, you can commit then
> review later.


Who in the community has voted for that rule, or do you make the rules
here? You can't talk about community in the same sentence as changing rules
for everybody just because you like that.
Where was the need to commit HAMA-745 without review? Why did you change
that testcase? This is just the "tip" of the iceberg of changes you are
doing to the trunk without the agreement of the community. We established a
community process during the incubation (that was even written on the
charter when graduating), so why do we not stick to it instead of laying
out the rules for self-needs / or that of your employee?

Regarding branches, maybe we all are not familiar with online
> collaboration (or don't want to collaborate anymore). If we want to
> walk own ways, why we need to be in here together?


Branching is something that is perfectly legal when something needs to be
developed in parallel to ongoing work. We don't have much ongoing work do
we? So I don't think branching is usually need when working on small
projects, because issues can be solved by communication. But if you commit
/ plan stuff to trunk without coordinating that with people (YOU KNOW) that
are currently working on it, then it is just a bad move.

In HAMA-704, I wanted to remove only message map to reduce memory
> consumption. I still don't want to talk about disk-based vertices and
> Spilling Queue at the moment. With this, I wanted to release 0.6.1
> 'partitioning issue fixed and quick executable examples' version ASAP.
>

You can't say B without saying A. The problems are much deeper than you
think they are. The message consumption is not a problem of the message
map, but a two fold problem of vertices that are in memory although they
don't need to and a not very scalable messaging system. I told you that
since the time we added the graph module, but I still fall on deaf ears
with you since more than a year.
Yea and tell you what? This requires a lot of changes.

If you would have invested the time to work with us on the root of all
issues instead of doing strange stuff e.G. like the partitioning jobs (in
the hours I wasted to tell you about the technical downsides of it I
could've built another Hadoop in FORTRAN) we could've gotten a release out
months ago and work on other things.

If we want to sort partitioned data using messaging system, idea
> should be collected.


The idea is there and the idea works, but I guess you're not following the
JIRA's you are +1'ing to?
Suraj is already working on the second part of the idea we divided by two
and instead of cock fighting with each other we should work together to
make this happening. And not as fast as possible because you want to roll
out a release for your employee, but because we want to improve the
framework radically and have enough time to test it throughoutly with
various configurations and not just a Oracle BDA.

P.S., These comments are never helpful in developing community.


It is something that needs to be discussed throughout the whole project,
and not on a single private mailing list. Community development doesn't
start with +1'ing and smiling to everything just to keep people on board.
Truth hurts, but is necessary to evolve something. Community starts with
people who have a vision in making a project better, it will develop for
itself when it is stable enough and has a bigger user base, you know-
developers are users too. If I can't run a graph job with 1gb of wikipedia
links on my laptop, this project is not very likely to be something I want
to develop on. So our first responsibility is to make our project running
perfectly smooth and nothing else. And that is something that must be
discussed with people who want to develop, but can't- and we need these
people.
And to be honest again, we didn't had much other people than GSoC students
that get a shitton of money for developing stuff and then walking away
again? I count myself in now as well, mea culpa.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message