helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kishore g <g.kish...@gmail.com>
Subject Re: Long GC
Date Sat, 04 May 2013 14:29:13 GMT
Hi Ming,

Need some more details,
1. How long was the GC, what is the session timeout in zk.

Behavior you are seeing is expected, what is happening is due to GC and
losing zookeeper session we call the transitions so that partition goes
back to OFFLINE state.

What is the behavior you are looking for when there is GC.

a. You dont want to lose mastership ? or
b. Its ok to lose mastership but you dont want to become master again ?

One question regarding your application, is it possible your application
can recover after long GC pause?

Dont think this is related to HELIX-79, in that case there were consecutive
GC's and I think we have a patch for that issue.

Kishore G

On Sat, May 4, 2013 at 6:32 AM, Ming Fang <mingfang@mac.com> wrote:

> We're experiencing a potentially showstopper issue with how Helix is
> dealing with very long GCs.
> Our system is using the Master Slave model.
> A simple test when running just the Master under extreme load, causing
> seconds of GC.
> Under long GC condition the Master gets transitioned to Slave then to
> Offline.
> After the GC, we get transited back to Slave then to Master.
> I found this Jira that may be related HELIX-79<https://issues.apache.org/jira/browse/HELIX-79>
> .
> We're scheduled to go live with our system next week.
> Are there any quick workarounds for this problem?

View raw message