hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Jungblut <thomas.jungb...@googlemail.com>
Subject Re: Fault Tolerance in 0.5.0
Date Sat, 04 Feb 2012 14:45:13 GMT
Thanks. I just "refactored" our issue tracker ;)
Hope it wasn't to spammy.

2012/2/4 Chia-Hung Lin <clin4j@googlemail.com>

> +1 It's good if we have an umbrella jira so we can track it easier.
>
> Failure detection (HAMA-370) was already done and tested on my
> machines previously.
>
> First point in HAMA-440 is not needed because it has been integrated
> into bsp task.
>
>
>
> On 3 February 2012 09:38, Edward J. Yoon <edwardyoon@apache.org> wrote:
> > We also can separate the issue into two parts: 1) cluster high
> > availability and 2) fault tolerant job processing. Only HAMA-370 is
> > related with 1).
> >
> > On Fri, Feb 3, 2012 at 10:23 AM, Edward J. Yoon <edwardyoon@apache.org>
> wrote:
> >> +1
> >>
> >> On Thu, Feb 2, 2012 at 8:39 PM, Thomas Jungblut
> >> <thomas.jungblut@googlemail.com> wrote:
> >>> Hey,
> >>>
> >>> I had a bit of time to go through the jira issues and sort out several
> >>> things related to Fault Tolerance.
> >>>
> >>> Here are my results:
> >>>
> >>> Fault Tolerance in Hama (all jiras related):
> >>>
> >>> [HAMA-199] Add fault tolerance to BSPPeer < CLOSE, too generic
> >>> [HAMA-445] Make configurable checkpointing
> >>> [HAMA-440] Features required in recovery procedure.
> >>> [HAMA-498] BSPTask should periodically ping its parent.
> >>>
> >>> Then I have splitted this in two main parts, "Detect Failure" and
> "Solve
> >>> Failure":
> >>>
> >>> Detect Failure:
> >>> [HAMA-370] Failure detector for Hama < Nearly complete?
> >>> [HAMA-498] BSPTask should periodically ping its parent.
> >>>
> >>> Solve Failure:
> >>> [HAMA-445] Make configurable checkpointing
> >>>> TODO:
> >>>> Groom needs functionality to restart a task
> >>>> BSPMaster needs functionality to restart a groom
> >>>
> >>> Also here is MISC, which is not strongly related.
> >>>
> >>> MISC:
> >>> [HAMA-445] Make configurable checkpointing
> >>> [HAMA-440] Features required in recovery procedure.
> >>>> TODO mainly discussion:
> >>>> New BSP "interface", with a chaining of supersteps to make restarting
> >>> tasks more simpler (contained in 440)
> >>>
> >>>
> >>> Let's make an umbrella jira for this larger task and close 199, since
> this
> >>> is way too generic and too old.
> >>> We should also split 440, because it combines too much unrelated things
> >>> together.
> >>>
> >>> Also "Lin" has assigned the majority of them. What is your progress?
> And do
> >>> you mind splitting these?
> >>>
> >>> [LINKS]
> >>> https://issues.apache.org/jira/browse/HAMA-440
> >>> https://issues.apache.org/jira/browse/HAMA-119
> >>> https://issues.apache.org/jira/browse/HAMA-445
> >>> https://issues.apache.org/jira/browse/HAMA-440
> >>> https://issues.apache.org/jira/browse/HAMA-370
> >>> https://issues.apache.org/jira/browse/HAMA-498
> >>>
> >>> --
> >>> Thomas Jungblut
> >>> Berlin <thomas.jungblut@gmail.com>
> >>
> >>
> >>
> >> --
> >> Best Regards, Edward J. Yoon
> >> @eddieyoon
> >
> >
> >
> > --
> > Best Regards, Edward J. Yoon
> > @eddieyoon
>



-- 
Thomas Jungblut
Berlin <thomas.jungblut@gmail.com>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message