kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bruno D. Rodrigues" <bruno.rodrig...@litux.org>
Subject Re: Anyone running kafka with a single broker in production? what about only 8GB ram?
Date Fri, 11 Oct 2013 18:17:34 GMT
Producer:
        props.put("batch.num.messages", "1000"); // 200
        props.put("queue.buffering.max.messages", "20000"); // 10000   
        props.put("request.required.acks", "0");
        props.put("producer.type", "async"); // sync

        // return ++this.count % a_numPartitions; // just round-robin
        props.put("partitioner.class", "main.SimplePartitioner"); // kafka.producer.DefaultPartitioner

        // disabled = 70MB source, 70MB network, enabled = 70MB source, ~40-50MB network
        props.put("compression.codec", "Snappy"); // none

Consumer is with default settings, as I test separately without any consumer at all, and then
test the extra load of having 1..n consumers. I assume the top speed would be without consumers
at all. I'm measuring both the produced messages as well as the consumer side.

On the kafka server I've changed the following, expecting less disk writes at the cost of
loosing messages:

#log.flush.interval.messages=10000
log.flush.interval.messages=10000000
#log.flush.interval.ms=1000
log.flush.interval.ms=10000
#log.segment.bytes=536870912
# is signed int 32, only up to 2^31-1!
log.segment.bytes=2000000000 
#log.retention.hours=168
log.retention.hours=1


Basically I need high throughput of discardable messages, so having them persisted temporarily
on the disk, in an highly optimised manner like Kafka shows, would be great not for the reliability
(not loosing messages), but because it would allow me to get some previous messages even if
the client (kafka client or real consumer client) disconnects, as well as providing a way
to go back in time some seconds if needed.



A 11/10/2013, às 18:56, Magnus Edenhill <magnus@edenhill.se> escreveu:

> Make sure the fetch batch size and the local consumer queue sizes are large
> enough, setting them too low will limit your throughput to the
> broker<->client latency.
> 
> This would be controlled using the following properties:
> - fetch.message.max.bytes
> - queued.max.message.chunks
> 
> On the producer side you would want to play with:
> - queue.buffering.max.ms and .messages
> - batch.num.messages
> 
> Memory on the broker should only affect disk cache performance, the more
> the merrier of course, but it depends on your use case, with a bit of luck
> the disk caches are already hot for the data you are reading (e.g.,
> recently produced).
> 
> Consuming millions of messages per second on quad core i7 with 8 gigs of
> RAM is possible without a sweat, given the disk caches are hot.
> 
> 
> Regards,
> Magnus
> 
> 
> 2013/10/11 Bruno D. Rodrigues <bruno.rodrigues@litux.org>
> 
>> 
>>> On Thu, Oct 10, 2013 at 3:57 PM, Bruno D. Rodrigues <
>>> bruno.rodrigues@litux.org> wrote:
>>> 
>>>> My personal newbie experience, which is surely completely wrong and
>>>> miss-configured, got me up to 70MB/sec, either with controlled 1K
>> messages
>>>> (hence 70Kmsg/sec) as well as with more random data (test data from 100
>>>> bytes to a couple MB). First I thought the 70MB were the hard disk
>> limit,
>>>> but when I got the same result both with a proper linux server with a
>> 10K
>>>> disk, as well as with a Mac mini with a 5400rpm disk, I got confused.
>>>> 
>>>> The mini has 2G, the linux server has 8 or 16, can'r recall at the
>> moment.
>>>> 
>>>> The test was performed both with single and multi producers and
>> consumers.
>>>> One producer = 70MB, two producers = 35MB each and so forth. Running
>>>> standalone instances on each server, same value. Running both together
>> in 2
>>>> partition 2 replica crossed mode, same result.
>>>> 
>>>> As far as I understood, more memory just means more kernel buffer space
>> to
>>>> speed up the lack of disk speed, as kafka seems to not really depend on
>>>> memory for the queueing.
>> 
>> A 11/10/2013, às 17:28, Guozhang Wang <wangguoz@gmail.com> escreveu:
>> 
>>> Hello,
>>> 
>>> In most cases of Kafka, network bottleneck will be hit before the disk
>>> bottleneck. So maybe you want to check your network capacity to see if it
>>> has been saturated.
>> 
>> They are all connected to Gbit ethernet cards and proper network routers.
>> I can easily get way above 950Mbps up and down between each machine and
>> even between multiple machines. Gbit is 128MB/s. 70MB/s is 560Kbps. So far
>> so good, 56% network capacity is a goodish value. But then I enable snappy,
>> get the same 70MB on the input and output side, and 20MB/sec on the
>> network, so it surely isn't network limits. It's also not on the input or
>> output side - the input reads a pre-processed MMaped file that reads at
>> 150MB/sec without cache (SSD) up to 3GB/sec when loaded into memory. The
>> output simply counts the messages and size of them.
>> 
>> One weird thing is that the kafka process seems to not cross the 100% cpu
>> on the top or equivalent command. Top shows 100% for each CPU, so a
>> multi-threaded process should go up to 400% (both the linux and mac mini
>> are 2 CPU with hiperthreading, so "almost" 4 cpus).
>> 
>> 
>> 


Mime
View raw message