kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "xingang (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (KAFKA-3062) Read from kafka replication to get data likely Version based
Date Mon, 04 Jan 2016 23:30:39 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082004#comment-15082004
] 

xingang edited comment on KAFKA-3062 at 1/4/16 11:30 PM:
---------------------------------------------------------

Yes! Example: 

Huge volume data producing to >60 partition, and 15 consumers will work on this data.

10 of them are time-latency sensitive, which is nearly real-time processing, it's better for
them to consume from the page cache to get the data, sometime a little data loss even can
be tolerant as its processing shows processing result for realtime

5 of them are reports processing from the data, it's Ok to be hours or even daily jobs, it
does not require to show its result in a short time. 

considering, if the 5 stats-processing are in a lag, and they will consume from the disk,
and make the page cache full of them, since such history data consuming are N times faster
than the producing rate. hence, the 10 time-latency sensitive processing are sad, since they
always see the page cache missing~~ once they get a short time lag


Thanks for your quick response!



was (Author: itismewxg):
Yes! Example: 

Huge volume data producing to >60 partition, and 15 consumers will works on this data.

10 of them are time-latency sensitive, which is nearly real-time processing, it's better for
them to consume from the page cache to get the data, sometime a little data loss even can
be tolerant as its processing shows processing result for realtime

5 of them are reports processing from the data, it's Ok to be hours or even daily jobs, it
does not require to show its result in a short time. 

considering, if the 5 stats-processing are in a lag, and they will consume from the disk,
and make the page cache full of them, since such history data consuming are N times faster
than the producing rate. hence, the 10 time-latency sensitive processing are sad, since they
always see the page cache missing~~ once they get a short time lag


Thanks for your quick response!


> Read from kafka replication to get data likely Version based
> ------------------------------------------------------------
>
>                 Key: KAFKA-3062
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3062
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: xingang
>
> Since Kafka require all the reading happens in the leader for the consistency.
> If there could be possible for the reading can happens in replication, thus, for data
have a number of consumers, for the consumers Not latency-sensitive But Data-Loss sensitive
can fetch its data from replication, in this case, it will pollute the Pagecache for other
consumers which are latency-sensitive



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message