bookkeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maciej Smoleński <jezd...@gmail.com>
Subject Re: Low write bandwidth
Date Wed, 10 Jun 2015 18:07:08 GMT
Thank You for Your comments and explanations.
If I understand correctly processing entry as a unit might be longer as it
requires to wait for all tcp framents that carry single entry, and write is
performed after that.
It looks like it introduce extra latency and might be the reason of not
saturated bandwidth.
I will try to confirm this.

Kind regards,
Maciej


On Wed, Jun 10, 2015 at 6:09 PM, Robin Dhamankar <robindh@apache.org> wrote:

> Flavio, that's right, we don't stream entries so the fully entry is
> processed as a unit. And as such network transfer of entry and its write to
> disk dont overlap.
>
> Essentially here throughput is bounded by the latency of each add request.
> As such for a fixed entry size, we cannot necessarily saturate I/O
> bandwidth (network or disk bandwidth) unless we can lower latencies.
>
> Maciej, Multiple packets can be in flight at the same time as such the
> latency is not strictly additive. The 1.39ms gives you a rough upper bound
> of the network latency and then there is the request processing latency on
> the bookie (as these two phases do not overlap as I explained above). For
> ramfs request processing should have low latency. We should probably
> measure that as well
>
> Thanks-
>
>
>
> On Wed, Jun 10, 2015 at 8:38 AM, Maciej Smoleński <jezdnia@gmail.com>
> wrote:
>
>> I run ping -s 65000 and the results are below.
>> Latency is always <1.5 ms.
>> Does it mean that for transporting single entry two packets will be used
>> and the latency will be: 2.5 ms (1.5 ms for (65K) and 1 ms for (35K) => 2.5
>> ms for 100K) ?
>> Is it possible to improve this ? Is it possible to increase packet size,
>> so that single entry fits single packet ?
>>
>>
>>
>> ping/from_client_to_server1
>> PING SN0101 (169.254.1.31) 65000(65028) bytes of data.
>> 65008 bytes from SN0101 (169.254.1.31): icmp_seq=1 ttl=64 time=1.39 ms
>> 65008 bytes from SN0101 (169.254.1.31): icmp_seq=2 ttl=64 time=1.29 ms
>> 65008 bytes from SN0101 (169.254.1.31): icmp_seq=3 ttl=64 time=1.29 ms
>> 65008 bytes from SN0101 (169.254.1.31): icmp_seq=4 ttl=64 time=1.31 ms
>> 65008 bytes from SN0101 (169.254.1.31): icmp_seq=5 ttl=64 time=1.32 ms
>>
>> ping/from_client_to_server2
>> PING SN0102 (169.254.1.32) 65000(65028) bytes of data.
>> 65008 bytes from SN0102 (169.254.1.32): icmp_seq=1 ttl=64 time=1.26 ms
>> 65008 bytes from SN0102 (169.254.1.32): icmp_seq=2 ttl=64 time=1.31 ms
>> 65008 bytes from SN0102 (169.254.1.32): icmp_seq=3 ttl=64 time=1.12 ms
>> 65008 bytes from SN0102 (169.254.1.32): icmp_seq=4 ttl=64 time=1.27 ms
>> 65008 bytes from SN0102 (169.254.1.32): icmp_seq=5 ttl=64 time=1.37 ms
>>
>> ping/from_client_to_server3
>> PING SN0103 (169.254.1.33) 65000(65028) bytes of data.
>> 65008 bytes from SN0103 (169.254.1.33): icmp_seq=1 ttl=64 time=1.25 ms
>> 65008 bytes from SN0103 (169.254.1.33): icmp_seq=2 ttl=64 time=1.38 ms
>> 65008 bytes from SN0103 (169.254.1.33): icmp_seq=3 ttl=64 time=1.25 ms
>> 65008 bytes from SN0103 (169.254.1.33): icmp_seq=4 ttl=64 time=1.33 ms
>> 65008 bytes from SN0103 (169.254.1.33): icmp_seq=5 ttl=64 time=1.32 ms
>>
>> ping/from_server1_to_client
>> PING AN0101 (169.254.1.11) 65000(65028) bytes of data.
>> 65008 bytes from AN0101 (169.254.1.11): icmp_seq=1 ttl=64 time=1.01 ms
>> 65008 bytes from AN0101 (169.254.1.11): icmp_seq=2 ttl=64 time=1.38 ms
>> 65008 bytes from AN0101 (169.254.1.11): icmp_seq=3 ttl=64 time=1.35 ms
>> 65008 bytes from AN0101 (169.254.1.11): icmp_seq=4 ttl=64 time=1.35 ms
>> 65008 bytes from AN0101 (169.254.1.11): icmp_seq=5 ttl=64 time=1.32 ms
>>
>> ping/from_server2_to_client
>> PING AN0101 (169.254.1.11) 65000(65028) bytes of data.
>> 65008 bytes from AN0101 (169.254.1.11): icmp_seq=1 ttl=64 time=0.887 ms
>> 65008 bytes from AN0101 (169.254.1.11): icmp_seq=2 ttl=64 time=1.31 ms
>> 65008 bytes from AN0101 (169.254.1.11): icmp_seq=3 ttl=64 time=1.32 ms
>> 65008 bytes from AN0101 (169.254.1.11): icmp_seq=4 ttl=64 time=0.998 ms
>> 65008 bytes from AN0101 (169.254.1.11): icmp_seq=5 ttl=64 time=1.22 ms
>>
>> ping/from_server3_to_client
>> PING AN0101 (169.254.1.11) 65000(65028) bytes of data.
>> 65008 bytes from AN0101 (169.254.1.11): icmp_seq=1 ttl=64 time=1.08 ms
>> 65008 bytes from AN0101 (169.254.1.11): icmp_seq=2 ttl=64 time=1.40 ms
>> 65008 bytes from AN0101 (169.254.1.11): icmp_seq=3 ttl=64 time=1.07 ms
>> 65008 bytes from AN0101 (169.254.1.11): icmp_seq=4 ttl=64 time=1.26 ms
>> 65008 bytes from AN0101 (169.254.1.11): icmp_seq=5 ttl=64 time=1.26 ms
>> 65008 bytes from AN0101 (169.254.1.11): icmp_seq=6 ttl=64 time=1.26 ms
>>
>> On Wed, Jun 10, 2015 at 4:45 PM, Aniruddha Laud <trojan.of.troy@gmail.com
>> > wrote:
>>
>>>
>>>
>>> On Wed, Jun 10, 2015 at 7:00 AM, Maciej Smoleński <jezdnia@gmail.com>
>>> wrote:
>>>
>>>> Thank You for Your comment.
>>>>
>>>> Unfortunately, these option will not help in my case.
>>>> In my case BookKeeper client will receive next request when previous
>>>> request is confirmed.
>>>> It is expected also that there will be only single stream of such
>>>> requests.
>>>>
>>>> I would like to understand how to achieve performance equal to the
>>>> network bandwidth.
>>>>
>>>
>>> to saturate bandwidth, you will have to have more than one outstanding
>>> request. 250 requests/second gives you 4ms per request. With each entry
>>> 100K in size, that's not unreasonable. My suggestion would be to monitor
>>> the write latency from the client to the server.
>>>
>>> ping -s 65000 should give you a baseline for what to expect with
>>> latencies.
>>>
>>> With 100K packets, you are going to see fragmentation at both the IP and
>>> the Ethernet layer. That wasn't the case with 1K payload.
>>>
>>> How many hops does one need to go from one machine to another? - higher
>>> the hops, higher the latency
>>>
>>>
>>>>
>>>>
>>>> On Wed, Jun 10, 2015 at 2:27 PM, Flavio Junqueira <
>>>> fpjunqueira@yahoo.com> wrote:
>>>>
>>>>> BK currently isn't wired to stream bytes to a ledger, so writing
>>>>> synchronously large entries as you're doing is likely not to get the
best
>>>>> its performance. A couple of things you could try to get higher performance
>>>>> are to write asynchronously and to have multiple clients writing.
>>>>>
>>>>> -Flavio
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>   On Wednesday, June 10, 2015 12:08 PM, Maciej Smoleński <
>>>>> jezdnia@gmail.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>> Hello,
>>>>>
>>>>> I'm testing BK performance when appending 100K entries synchronously
>>>>> from 1 thread (using one ledger).
>>>>> The performance I get is 250 entries/s.
>>>>>
>>>>> What performance should I expect ?
>>>>>
>>>>> My setup:
>>>>>
>>>>> Ledger:
>>>>> Ensemble size: 3
>>>>> Quorum size: 2
>>>>>
>>>>> 1 client machine and 3 server machines.
>>>>>
>>>>> Network:
>>>>> Each machine with bonding: 4 x 1000Mbps on each machine
>>>>> manually tested between client and server: 400MB/s
>>>>>
>>>>> Disk:
>>>>> I tested two configurations:
>>>>> dedicated disks with ext3 (different for zookeeper, journal, data,
>>>>> index, log)
>>>>> dedicated ramfs partitions (different for zookeeper, journal, data,
>>>>> index, log)
>>>>>
>>>>> In both configurations the performance is the same: 250 entries / s
>>>>> (25MB / s).
>>>>> I confirmed this with measured network bandwidth:
>>>>> - on client 50 MB/s
>>>>> - on server 17 MB/s
>>>>>
>>>>> I run java with profiler enabled on BK client and BK server but didn't
>>>>> find anything unexpected (but I don't know bookkeeper internals).
>>>>>
>>>>> I tested it with two BookKeeper versions:
>>>>> - 4.3.0
>>>>> - 4.2.2
>>>>> The result were the same with both BookKeeper versions.
>>>>>
>>>>> What should be changed/checked to get better performance ?
>>>>>
>>>>> Kind regards,
>>>>> Maciej
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message