kafka-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jun...@apache.org
Subject svn commit: r1445018 - /kafka/site/design.html
Date Tue, 12 Feb 2013 02:14:31 GMT
Author: junrao
Date: Tue Feb 12 02:14:30 2013
New Revision: 1445018

URL: http://svn.apache.org/r1445018
Log:
fix typos

Modified:
    kafka/site/design.html

Modified: kafka/site/design.html
URL: http://svn.apache.org/viewvc/kafka/site/design.html?rev=1445018&r1=1445017&r2=1445018&view=diff
==============================================================================
--- kafka/site/design.html (original)
+++ kafka/site/design.html Tue Feb 12 02:14:30 2013
@@ -127,7 +127,7 @@ Having access to virtually unlimited dis
 Our assumption is that the volume of messages is extremely high, indeed it is some multiple
of the total number of page views for the site (since a page view is one of the activities
we process). Furthermore we assume each message published is read at least once (and often
multiple times), hence we optimize for consumption rather than production.
 </p>
 <p>
-There are two common causes of inefficiency: two many network requests, and excessive byte
copying.	
+There are two common causes of inefficiency: too many network requests, and excessive byte
copying.	
 </p>
 <p>
 To encourage efficiency, the APIs are built around a "message set" abstraction that naturally
groups messages. This allows network requests to group messages together and amortize the
overhead of the network roundtrip rather than sending a single message at a time.
@@ -164,7 +164,7 @@ For more background on the sendfile and 
 In many cases the bottleneck is actually not CPU but network. This is particularly true for
a data pipeline that needs to send messages across data centers. Of course the user can always
send compressed messages without any support needed from Kafka, but this can lead to very
poor compression ratios as much of the redundancy is due to repetition between messages (e.g.
field names in JSON or user agents in web logs or common string values). Efficient compression
requires compressing multiple messages together rather than compressing each message individually.
Ideally this would be possible in an end-to-end fashion&mdash;that is, data would be compressed
prior to sending by the producer and remain compressed on the server, only being decompressed
by the eventual consumers.
 </p>
 <p>
-Kafka supports this be allowing recursive message sets. A batch of messages can be clumped
together compressed and sent to the server in this form. This batch of messages will be delivered
all to the same consumer and will remain in compressed form until it arrives there.
+Kafka supports this by allowing recursive message sets. A batch of messages can be clumped
together compressed and sent to the server in this form. This batch of messages will be delivered
all to the same consumer and will remain in compressed form until it arrives there.
 </p>
 <p>
 Kafka supports GZIP and Snappy compression protocols. More details on compression can be
found <a href="https://cwiki.apache.org/confluence/display/KAFKA/Compression">here</a>.
@@ -222,7 +222,7 @@ Kafka is built to be run across a cluste
 
 <h3>Automatic producer load balancing</h3>
 <p>
-Kafka supports client-side load balancing for message producers or use of a dedicated load
balancer to balance TCP connections. A dedicated layer-4 load balancer works by balancing
TCP connections over Kafka brokers. In this configuration all messages from a given producer
go to a single broker. The advantage of using a level-4 load balancer is that each producer
only needs a single TCP connection, and no connection to zookeeper is needed. The disadvantage
is that the balancing is done at the TCP connection level, and hence it may not be well balanced
(if some producers produce many more messages then others, evenly dividing up the connections
per broker may not result in evenly dividing up the messages per broker).
+Kafka supports client-side load balancing for message producers or use of a dedicated load
balancer to balance TCP connections. A dedicated layer-4 load balancer works by balancing
TCP connections over Kafka brokers. In this configuration all messages from a given producer
go to a single broker. The advantage of using a level-4 load balancer is that each producer
only needs a single TCP connection, and no connection to zookeeper is needed. The disadvantage
is that the balancing is done at the TCP connection level, and hence it may not be well balanced
(if some producers produce many more messages than others, evenly dividing up the connections
per broker may not result in evenly dividing up the messages per broker).
 <p>
 Client-side zookeeper-based load balancing solves some of these problems. It allows the producer
to dynamically discover new brokers, and balance load on a per-request basis. Likewise it
allows the producer to partition data according to some key instead of randomly, which enables
stickiness on the consumer (e.g. partitioning data consumption by user id). This feature is
called "semantic partitioning", and is described in more detail below.
 <p>
@@ -396,7 +396,7 @@ The createMessageStreams call registers 
 </p>
 <h2>Network Layer</h2>
 <p>
-The network layer is a fairly straight-forward NIO server, and will not be described in great
detail. The sendfile implementation is done by giving the <code>MessageSet</code>
interface a <code>writeTo</code> method. This allows the file-backed message set
to use the more efficient <code>transferTo</code> implementation instead of an
in-process buffered write. The threading model is a single acceptor thread and <i>N</i>
processor threads which handle a fixed number of connections each. This design has been pretty
thoroughly tested <a href="http://sna-projects.com/blog/2009/08/introducing-the-nio-socketserver-implementation">elsewhere</a>
and found to be simple to implement and fast. The protocol is kept quite simple to allow for
future the implementation of clients in other languages.
+The network layer is a fairly straight-forward NIO server, and will not be described in great
detail. The sendfile implementation is done by giving the <code>MessageSet</code>
interface a <code>writeTo</code> method. This allows the file-backed message set
to use the more efficient <code>transferTo</code> implementation instead of an
in-process buffered write. The threading model is a single acceptor thread and <i>N</i>
processor threads which handle a fixed number of connections each. This design has been pretty
thoroughly tested <a href="http://sna-projects.com/blog/2009/08/introducing-the-nio-socketserver-implementation">elsewhere</a>
and found to be simple to implement and fast. The protocol is kept quite simple to allow for
future implementation of clients in other languages.
 </p>
 <h2>Messages</h2>
 <p>



Mime
View raw message