kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dong Lin <lindon...@gmail.com>
Subject Re: [DISCUSS] KIP-112: Handle disk failure for JBOD
Date Wed, 08 Feb 2017 00:15:51 GMT
Hey Grant,

Thanks much for the detailed summary! Yes, this is pretty much my
understanding of the KIP meeting.

I also think everyone agreed on the point you outlined in the email. Here
is my reply to the five issues you mentioned.

1) Automatic vs Manual Recovery

In the case where a disk is replaced and not "repaired", we can allow
automatic recovery by doing this: a broker should always create replica on
its disk if it has not bad log directory regardless of create flag in
LeaderAndIsrRequest. I think this is a reasonable solution. If user remove
bad log directory from broker config or replace it with a good log
directory, broker should feel free to create replica if it doesn't exist.

After this change, user will no longer need to manually execute script to
reset created flag to recover a broker. The created flag in zookeeper will
only be used by broker to ensure that broker doesn't re-create a replica on
a good log directory if it restarts with some bad log directories.

I will update the KIP to use this solution. Does this address everyone's
concern on this issue?


2) Client communication

After the change described above, client will no longer need to write
notification on zookeeper.

Though the broker will still notify controller via zookeeper, it is
something that broker already does for ISR change notification. I am not
aware of any good reason for broker to notify controller via direct RPC
instead of zookeeper. If there is such reason, can we change both of them
together in a followup KIP?

3) What is failure

I agree it is useful to handle various failures in different ways. This KIP
does not address its issue and simply classify a log directory as good or
bad. A log directory is good if broker can read and write from it without
exception, otherwise it is bad. This is strictly better than the existing
broker implementation, where broker does not handle "slow" disk and will
fail if there is any exception.

I will include this as future work in the KIP. Does this sound good?


4) Manually making a directory as bad

I agree it is a good idea and can be handled by a future KIP. I will
include this as future work in the KIP.

5) Implementation complexity

Implementation complexity and its comparison with benefit can sometimes
be subjective. I will try to provide reason why I think its implementation
complexity is not that much.

- The broker logic is not complex. It catches exception when access log
directory, mark log directory and all its replicas as offline, notify
controller by writing the zookeeper notification path, and specify error in
LeaderAndIsrResponse.
- The controller logic is not complex. It listener to zookeeper for disk
failure notification, learn about offline replicas in the
LeaderAndIsrResponse, and take offline replicas into consideration when
electing leaders. It also mark replica as created in zookeeper and use it
to determine whether a replica is created.

That is all the logic we need to add to kafka code. I don't find it very
complex.

Also, I expect the code for KIP-112 to be around 1100 lines new code.
Previously I have implemented a prototype of a slight different design (see
here
<https://docs.google.com/document/d/1Izza0SBmZMVUBUt9s_-Dqi3D8e0KGJQYW8xgEdRsgAI/edit>)
and uploaded it to github (see here
<https://github.com/lindong28/kafka/tree/JBOD>). The patch changed 33
files, added 1185 lines and deleted 183 lines. The size of prototype patch
is actually smaller than patch of KIP-107 (see here
<https://github.com/apache/kafka/pull/2476>), which changed 49 files, added
1349 lines and deleted 141 lines.

Thanks,
Dong


On Tue, Feb 7, 2017 at 2:22 PM, Grant Henke <ghenke@cloudera.com> wrote:

> Hi Dong,
>
> Thanks for proposing the KIP and all the hard work on it!
>
> In order to help summarize the discussion from the KIP call today I wanted
> to list the things I heard as the main discussion points that people would
> like to be considered or discussed. However, this is strictly from memory
> so please add anything I missed.
>
> First I want to list the thing I think everyone agreed on. If you don't
> agree with this please correct me:
>
>    - Partial broker failure on a single bad directory is an improvement
>    over existing behavior
>    - Any "simplified" implementation should not prevent improvements going
>    forward
>    - "one-broker-per-disk" or "one-broker-per-few-disks" is an option but
>    doesn't satisfy everyones requirements and has complexities of its own.
>    Improving the handling of failed directories doesn't prevent this
>    deployment pattern.
>    - It may still be worth listing, even if theoretical the overhead of
>       this type of deployment
>    - This KIP and in general (at this time) no one wants Kafka to worry
>    about physical disks or logical volumes. The discussion of that should
> be
>    handled by any future KIPs intending to introduce related functionality.
>    Today in Kafka and this KIP deal strictly with the top level
> "directories"
>    (log.dirs) configured by Kafka and knows nothing more than that they
> are a
>    directory.
>
> Below are other ideas and feedback discussed.
>
> *Automatic vs Manual Recovery:*
>
> There are 2 ways to handle directory IO failures. One extreme is to try to
> recover automatically (on restart in this case) and the other is to require
> manual intervention from and administrator (even through restarts). Some
> expressed that it would be valuable to track directories and make enabling
> them an explicit admin action as opposed to broker state in memory that is
> reset on broker restart. The sentiment from the discussion is that this
> could be a goal for the future but doesn't need to block this KIP.
>
> I noted that I think either choice is okay but don't think a mixed mode
> where new topics are handled automatically and old topics are manual would
> be easy to understand. We found this scenario is likely true only for when
> a disk is replaced and not "repaired" therefore leaving the data in tact.
> Regardless defining this explicitly is important.
>
> *Client Communication:*
>
> Given the above discussion I thought it was important to be sure we needed
> any manually interaction at all from admins/clients. Today the KIP suggests
> using a notification on zookeeper under /log_dir_event_notification. If
> manual client interaction is still required. I request that clients
> communicate with the broker via the wire protocol instead of directly to
> Zookeeper. I think the AdminClient in KIP-117 would be a great place for
> this api.
>
> *What is Failure:*
>
> There was a lot of discussion based around defining failure and the various
> permanent and temporary ways a disk can "fail". It was suggested this can
> be fairly complicated and it may be worth evaluating or at least
> documenting the specific scenarios handled and how (read failures, write
> failures, slow IO, etc). Going forward improvements to detect and handle
> more interesting failures may be useful.
>
> *Manually Marking a Directory as Bad:*
>
> Because of the topics above it was mentioned that it would be useful to
> allow an admin to mark a directory that appears "good" as "bad". Often in
> soft failures an admin may know a directory is bad before Kafka does and it
> would be nice to be able to mark the directory as bad without a restart. In
> the case of a restart it was noted that you could simply update the
> log.dirs configuration on the broker.
>
> The general consensus was that this was a good idea, but could be handled
> by a future KIP.
>
> *Implementation complexity: *
>
> Some people were concerned about the complexity of the implementation vs
> the benefits. It sounds like many agreed it was "worth it", but that
> justifying and describing the complexity tradeoffs in the KIP would be
> useful. I will let others describe there concerns more in follow up emails.
>
> Those were the main talking points I remember, please add more details or
> follow ups on anything I missed.
>
> Thanks,
> Grant
>
>
>
> On Tue, Feb 7, 2017 at 3:51 AM, Eno Thereska <eno.thereska@gmail.com>
> wrote:
>
> > Hi Dong,
> >
> > To simplify the discussion today, on my part I'll zoom into one thing
> only:
> >
> > - I'll discuss the options called below : "one-broker-per-disk" or
> > "one-broker-per-few-disks".
> >
> > - I completely buy the JBOD vs RAID arguments so there is no need to
> > discuss that part for me. I buy it that JBODs are good.
> >
> > I find the terminology can be improved a bit. Ideally we'd be talking
> > about volumes, not disks. Just to make it clear that Kafka understand
> > volumes/directories, not individual raw disks. So by
> > "one-broker-per-few-disks" what I mean is that the admin can pool a few
> > disks together to create a volume/directory and give that to Kafka.
> >
> >
> > The kernel of my question will be that the admin already has tools to 1)
> > create volumes/directories from a JBOD and 2) start a broker on a desired
> > machine and 3) assign a broker resources like a directory. I claim that
> > those tools are sufficient to optimise resource allocation.  I understand
> > that a broker could manage point 3) itself, ie juggle the directories. My
> > question is whether the complexity added to Kafka is justified.
> > Operationally it seems to me an admin will still have to do all the three
> > items above.
> >
> > Looking forward to the discussion
> > Thanks
> > Eno
> >
> >
> > > On 1 Feb 2017, at 17:21, Dong Lin <lindong28@gmail.com> wrote:
> > >
> > > Hey Eno,
> > >
> > > Thanks much for the review.
> > >
> > > I think your suggestion is to split disks of a machine into multiple
> disk
> > > sets and run one broker per disk set. Yeah this is similar to Colin's
> > > suggestion of one-broker-per-disk, which we have evaluated at LinkedIn
> > and
> > > considered it to be a good short term approach.
> > >
> > > As of now I don't think any of these approach is a better alternative
> in
> > > the long term. I will summarize these here. I have put these reasons in
> > the
> > > KIP's motivation section and rejected alternative section. I am happy
> to
> > > discuss more and I would certainly like to use an alternative solution
> > that
> > > is easier to do with better performance.
> > >
> > > - JBOD vs. RAID-10: if we switch from RAID-10 with
> replication-factoer=2
> > to
> > > JBOD with replicatio-factor=3, we get 25% reduction in disk usage and
> > > doubles the tolerance of broker failure before data unavailability
> from 1
> > > to 2. This is pretty huge gain for any company that uses Kafka at large
> > > scale.
> > >
> > > - JBOD vs. one-broker-per-disk: The benefit of one-broker-per-disk is
> > that
> > > no major code change is needed in Kafka. Among the disadvantage of
> > > one-broker-per-disk summarized in the KIP and previous email with
> Colin,
> > > the biggest one is the 15% throughput loss compared to JBOD and less
> > > flexibility to balance across disks. Further, it probably requires
> change
> > > to internal deployment tools at various companies to deal with
> > > one-broker-per-disk setup.
> > >
> > > - JBOD vs. RAID-0: This is the setup that used at Microsoft. The
> problem
> > is
> > > that a broker becomes unavailable if any disk fail. Suppose
> > > replication-factor=2 and there are 10 disks per machine. Then the
> > > probability of of any message becomes unavailable due to disk failure
> > with
> > > RAID-0 is 100X higher than that with JBOD.
> > >
> > > - JBOD vs. one-broker-per-few-disks: one-broker-per-few-disk is
> somewhere
> > > between one-broker-per-disk and RAID-0. So it carries an averaged
> > > disadvantages of these two approaches.
> > >
> > > To answer your question regarding, I think it is reasonable to mange
> disk
> > > in Kafka. By "managing disks" we mean the management of assignment of
> > > replicas across disks. Here are my reasons in more detail:
> > >
> > > - I don't think this KIP is a big step change. By allowing user to
> > > configure Kafka to run multiple log directories or disks as of now, it
> is
> > > implicit that Kafka manages disks. It is just not a complete feature.
> > > Microsoft and probably other companies are using this feature under the
> > > undesirable effect that a broker will fail any if any disk fail. It is
> > good
> > > to complete this feature.
> > >
> > > - I think it is reasonable to manage disk in Kafka. One of the most
> > > important work that Kafka is doing is to determine the replica
> assignment
> > > across brokers and make sure enough copies of a given replica is
> > available.
> > > I would argue that it is not much different than determining the
> replica
> > > assignment across disk conceptually.
> > >
> > > - I would agree that this KIP is improve performance of Kafka at the
> cost
> > > of more complexity inside Kafka, by switching from RAID-10 to JBOD. I
> > would
> > > argue that this is a right direction. If we can gain 20%+ performance
> by
> > > managing NIC in Kafka as compared to existing approach and other
> > > alternatives, I would say we should just do it. Such a gain in
> > performance,
> > > or equivalently reduction in cost, can save millions of dollars per
> year
> > > for any company running Kafka at large scale.
> > >
> > > Thanks,
> > > Dong
> > >
> > >
> > > On Wed, Feb 1, 2017 at 5:41 AM, Eno Thereska <eno.thereska@gmail.com>
> > wrote:
> > >
> > >> I'm coming somewhat late to the discussion, apologies for that.
> > >>
> > >> I'm worried about this proposal. It's moving Kafka to a world where it
> > >> manages disks. So in a sense, the scope of the KIP is limited, but the
> > >> direction it sets for Kafka is quite a big step change. Fundamentally
> > this
> > >> is about balancing resources for a Kafka broker. This can be done by a
> > >> tool, rather than by changing Kafka. E.g., the tool would take a bunch
> > of
> > >> disks together, create a volume over them and export that to a Kafka
> > broker
> > >> (in addition to setting the memory limits for that broker or limiting
> > other
> > >> resources). A different bunch of disks can then make up a second
> volume,
> > >> and be used by another Kafka broker. This is aligned with what Colin
> is
> > >> saying (as I understand it).
> > >>
> > >> Disks are not the only resource on a machine, there are several
> > instances
> > >> where multiple NICs are used for example. Do we want fine grained
> > >> management of all these resources? I'd argue that opens us the system
> > to a
> > >> lot of complexity.
> > >>
> > >> Thanks
> > >> Eno
> > >>
> > >>
> > >>> On 1 Feb 2017, at 01:53, Dong Lin <lindong28@gmail.com> wrote:
> > >>>
> > >>> Hi all,
> > >>>
> > >>> I am going to initiate the vote If there is no further concern with
> the
> > >> KIP.
> > >>>
> > >>> Thanks,
> > >>> Dong
> > >>>
> > >>>
> > >>> On Fri, Jan 27, 2017 at 8:08 PM, radai <radai.rosenblatt@gmail.com>
> > >> wrote:
> > >>>
> > >>>> a few extra points:
> > >>>>
> > >>>> 1. broker per disk might also incur more client <--> broker
sockets:
> > >>>> suppose every producer / consumer "talks" to >1 partition, there's
a
> > >> very
> > >>>> good chance that partitions that were co-located on a single 10-disk
> > >> broker
> > >>>> would now be split between several single-disk broker processes
on
> the
> > >> same
> > >>>> machine. hard to put a multiplier on this, but likely >x1. sockets
> > are a
> > >>>> limited resource at the OS level and incur some memory cost (kernel
> > >>>> buffers)
> > >>>>
> > >>>> 2. there's a memory overhead to spinning up a JVM (compiled code
and
> > >> byte
> > >>>> code objects etc). if we assume this overhead is ~300 MB (order
of
> > >>>> magnitude, specifics vary) than spinning up 10 JVMs would lose
you 3
> > GB
> > >> of
> > >>>> RAM. not a ton, but non negligible.
> > >>>>
> > >>>> 3. there would also be some overhead downstream of kafka in any
> > >> management
> > >>>> / monitoring / log aggregation system. likely less than x10 though.
> > >>>>
> > >>>> 4. (related to above) - added complexity of administration with
more
> > >>>> running instances.
> > >>>>
> > >>>> is anyone running kafka with anywhere near 100GB heaps? i thought
> the
> > >> point
> > >>>> was to rely on kernel page cache to do the disk buffering ....
> > >>>>
> > >>>> On Thu, Jan 26, 2017 at 11:00 AM, Dong Lin <lindong28@gmail.com>
> > wrote:
> > >>>>
> > >>>>> Hey Colin,
> > >>>>>
> > >>>>> Thanks much for the comment. Please see me comment inline.
> > >>>>>
> > >>>>> On Thu, Jan 26, 2017 at 10:15 AM, Colin McCabe <cmccabe@apache.org
> >
> > >>>> wrote:
> > >>>>>
> > >>>>>> On Wed, Jan 25, 2017, at 13:50, Dong Lin wrote:
> > >>>>>>> Hey Colin,
> > >>>>>>>
> > >>>>>>> Good point! Yeah we have actually considered and tested
this
> > >>>> solution,
> > >>>>>>> which we call one-broker-per-disk. It would work and
should
> require
> > >>>> no
> > >>>>>>> major change in Kafka as compared to this JBOD KIP.
So it would
> be
> > a
> > >>>>> good
> > >>>>>>> short term solution.
> > >>>>>>>
> > >>>>>>> But it has a few drawbacks which makes it less desirable
in the
> > long
> > >>>>>>> term.
> > >>>>>>> Assume we have 10 disks on a machine. Here are the
problems:
> > >>>>>>
> > >>>>>> Hi Dong,
> > >>>>>>
> > >>>>>> Thanks for the thoughtful reply.
> > >>>>>>
> > >>>>>>>
> > >>>>>>> 1) Our stress test result shows that one-broker-per-disk
has 15%
> > >>>> lower
> > >>>>>>> throughput
> > >>>>>>>
> > >>>>>>> 2) Controller would need to send 10X as many LeaderAndIsrRequest,
> > >>>>>>> MetadataUpdateRequest and StopReplicaRequest. This
increases the
> > >>>> burden
> > >>>>>>> on
> > >>>>>>> controller which can be the performance bottleneck.
> > >>>>>>
> > >>>>>> Maybe I'm misunderstanding something, but there would not
be 10x
> as
> > >>>> many
> > >>>>>> StopReplicaRequest RPCs, would there?  The other requests
would
> > >>>> increase
> > >>>>>> 10x, but from a pretty low base, right?  We are not reassigning
> > >>>>>> partitions all the time, I hope (or else we have bigger
> problems...)
> > >>>>>>
> > >>>>>
> > >>>>> I think the controller will group StopReplicaRequest per broker
and
> > >> send
> > >>>>> only one StopReplicaRequest to a broker during controlled shutdown.
> > >>>> Anyway,
> > >>>>> we don't have to worry about this if we agree that other requests
> > will
> > >>>>> increase by 10X. One MetadataRequest to send to each broker
in the
> > >>>> cluster
> > >>>>> every time there is leadership change. I am not sure this is
a real
> > >>>>> problem. But in theory this makes the overhead complexity O(number
> of
> > >>>>> broker) and may be a concern in the future. Ideally we should
avoid
> > it.
> > >>>>>
> > >>>>>
> > >>>>>>
> > >>>>>>>
> > >>>>>>> 3) Less efficient use of physical resource on the machine.
The
> > number
> > >>>>> of
> > >>>>>>> socket on each machine will increase by 10X. The number
of
> > connection
> > >>>>>>> between any two machine will increase by 100X.
> > >>>>>>>
> > >>>>>>> 4) Less efficient way to management memory and quota.
> > >>>>>>>
> > >>>>>>> 5) Rebalance between disks/brokers on the same machine
will less
> > >>>>>>> efficient
> > >>>>>>> and less flexible. Broker has to read data from another
broker on
> > the
> > >>>>>>> same
> > >>>>>>> machine via socket. It is also harder to do automatic
load
> balance
> > >>>>>>> between
> > >>>>>>> disks on the same machine in the future.
> > >>>>>>>
> > >>>>>>> I will put this and the explanation in the rejected
alternative
> > >>>>> section.
> > >>>>>>> I
> > >>>>>>> have a few questions:
> > >>>>>>>
> > >>>>>>> - Can you explain why this solution can help avoid
scalability
> > >>>>>>> bottleneck?
> > >>>>>>> I actually think it will exacerbate the scalability
problem due
> the
> > >>>> 2)
> > >>>>>>> above.
> > >>>>>>> - Why can we push more RPC with this solution?
> > >>>>>>
> > >>>>>> To really answer this question we'd have to take a deep
dive into
> > the
> > >>>>>> locking of the broker and figure out how effectively it
can
> > >> parallelize
> > >>>>>> truly independent requests.  Almost every multithreaded
process is
> > >>>> going
> > >>>>>> to have shared state, like shared queues or shared sockets,
that
> is
> > >>>>>> going to make scaling less than linear when you add disks
or
> > >>>> processors.
> > >>>>>> (And clearly, another option is to improve that scalability,
> rather
> > >>>>>> than going multi-process!)
> > >>>>>>
> > >>>>>
> > >>>>> Yeah I also think it is better to improve scalability inside
kafka
> > code
> > >>>> if
> > >>>>> possible. I am not sure we currently have any scalability issue
> > inside
> > >>>>> Kafka that can not be removed without using multi-process.
> > >>>>>
> > >>>>>
> > >>>>>>
> > >>>>>>> - It is true that a garbage collection in one broker
would not
> > affect
> > >>>>>>> others. But that is after every broker only uses 1/10
of the
> > memory.
> > >>>>> Can
> > >>>>>>> we be sure that this will actually help performance?
> > >>>>>>
> > >>>>>> The big question is, how much memory do Kafka brokers use
now, and
> > how
> > >>>>>> much will they use in the future?  Our experience in HDFS
was that
> > >> once
> > >>>>>> you start getting more than 100-200GB Java heap sizes,
full GCs
> > start
> > >>>>>> taking minutes to finish when using the standard JVMs.
 That alone
> > is
> > >> a
> > >>>>>> good reason to go multi-process or consider storing more
things
> off
> > >> the
> > >>>>>> Java heap.
> > >>>>>>
> > >>>>>
> > >>>>> I see. Now I agree one-broker-per-disk should be more efficient
in
> > >> terms
> > >>>> of
> > >>>>> GC since each broker probably needs less than 1/10 of the memory
> > >>>> available
> > >>>>> on a typical machine nowadays. I will remove this from the
reason
> of
> > >>>>> rejection.
> > >>>>>
> > >>>>>
> > >>>>>>
> > >>>>>> Disk failure is the "easy" case.  The "hard" case, which
is
> > >>>>>> unfortunately also the much more common case, is disk misbehavior.
> > >>>>>> Towards the end of their lives, disks tend to start slowing
down
> > >>>>>> unpredictably.  Requests that would have completed immediately
> > before
> > >>>>>> start taking 20, 100 500 milliseconds.  Some files may
be readable
> > and
> > >>>>>> other files may not be.  System calls hang, sometimes forever,
and
> > the
> > >>>>>> Java process can't abort them, because the hang is in the
kernel.
> > It
> > >>>> is
> > >>>>>> not fun when threads are stuck in "D state"
> > >>>>>> http://stackoverflow.com/questions/20423521/process-perminan
> > >>>>>> tly-stuck-on-d-state
> > >>>>>> .  Even kill -9 cannot abort the thread then.  Fortunately,
this
> is
> > >>>>>> rare.
> > >>>>>>
> > >>>>>
> > >>>>> I agree it is a harder problem and it is rare. We probably
don't
> have
> > >> to
> > >>>>> worry about it in this KIP since this issue is orthogonal to
> whether
> > or
> > >>>> not
> > >>>>> we use JBOD.
> > >>>>>
> > >>>>>
> > >>>>>>
> > >>>>>> Another approach we should consider is for Kafka to implement
its
> > own
> > >>>>>> storage layer that would stripe across multiple disks.
 This
> > wouldn't
> > >>>>>> have to be done at the block level, but could be done at
the file
> > >>>> level.
> > >>>>>> We could use consistent hashing to determine which disks
a file
> > should
> > >>>>>> end up on, for example.
> > >>>>>>
> > >>>>>
> > >>>>> Are you suggesting that we should distribute log, or log segment,
> > >> across
> > >>>>> disks of brokers? I am not sure if I fully understand this
> approach.
> > My
> > >>>> gut
> > >>>>> feel is that this would be a drastic solution that would require
> > >>>>> non-trivial design. While this may be useful to Kafka, I would
> prefer
> > >> not
> > >>>>> to discuss this in detail in this thread unless you believe
it is
> > >>>> strictly
> > >>>>> superior to the design in this KIP in terms of solving our
> use-case.
> > >>>>>
> > >>>>>
> > >>>>>> best,
> > >>>>>> Colin
> > >>>>>>
> > >>>>>>>
> > >>>>>>> Thanks,
> > >>>>>>> Dong
> > >>>>>>>
> > >>>>>>> On Wed, Jan 25, 2017 at 11:34 AM, Colin McCabe <
> cmccabe@apache.org
> > >
> > >>>>>>> wrote:
> > >>>>>>>
> > >>>>>>>> Hi Dong,
> > >>>>>>>>
> > >>>>>>>> Thanks for the writeup!  It's very interesting.
> > >>>>>>>>
> > >>>>>>>> I apologize in advance if this has been discussed
somewhere
> else.
> > >>>>> But
> > >>>>>> I
> > >>>>>>>> am curious if you have considered the solution
of running
> multiple
> > >>>>>>>> brokers per node.  Clearly there is a memory overhead
with this
> > >>>>>> solution
> > >>>>>>>> because of the fixed cost of starting multiple
JVMs.  However,
> > >>>>> running
> > >>>>>>>> multiple JVMs would help avoid scalability bottlenecks.
 You
> could
> > >>>>>>>> probably push more RPCs per second, for example.
 A garbage
> > >>>>> collection
> > >>>>>>>> in one broker would not affect the others.  It
would be
> > interesting
> > >>>>> to
> > >>>>>>>> see this considered in the "alternate designs"
design, even if
> you
> > >>>>> end
> > >>>>>>>> up deciding it's not the way to go.
> > >>>>>>>>
> > >>>>>>>> best,
> > >>>>>>>> Colin
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> On Thu, Jan 12, 2017, at 10:46, Dong Lin wrote:
> > >>>>>>>>> Hi all,
> > >>>>>>>>>
> > >>>>>>>>> We created KIP-112: Handle disk failure for
JBOD. Please find
> the
> > >>>>> KIP
> > >>>>>>>>> wiki
> > >>>>>>>>> in the link https://cwiki.apache.org/confl
> > >>>> uence/display/KAFKA/KIP-
> > >>>>>>>>> 112%3A+Handle+disk+failure+for+JBOD.
> > >>>>>>>>>
> > >>>>>>>>> This KIP is related to KIP-113
> > >>>>>>>>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > >>>>>>>> 113%3A+Support+replicas+movement+between+log+directories>:
> > >>>>>>>>> Support replicas movement between log directories.
They are
> > >>>> needed
> > >>>>> in
> > >>>>>>>>> order
> > >>>>>>>>> to support JBOD in Kafka. Please help review
the KIP. You
> > >>>> feedback
> > >>>>> is
> > >>>>>>>>> appreciated!
> > >>>>>>>>>
> > >>>>>>>>> Thanks,
> > >>>>>>>>> Dong
> > >>>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>
> > >>
> >
> >
>
>
> --
> Grant Henke
> Software Engineer | Cloudera
> grant@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message