Return-Path: X-Original-To: apmail-subversion-users-archive@minotaur.apache.org Delivered-To: apmail-subversion-users-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 814D29E6D for ; Sun, 14 Oct 2012 01:07:13 +0000 (UTC) Received: (qmail 53056 invoked by uid 500); 14 Oct 2012 01:07:12 -0000 Delivered-To: apmail-subversion-users-archive@subversion.apache.org Received: (qmail 53003 invoked by uid 500); 14 Oct 2012 01:07:12 -0000 Mailing-List: contact users-help@subversion.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list users@subversion.apache.org Received: (qmail 52995 invoked by uid 99); 14 Oct 2012 01:07:11 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 14 Oct 2012 01:07:11 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of baiwei.cn@gmail.com designates 209.85.217.171 as permitted sender) Received: from [209.85.217.171] (HELO mail-lb0-f171.google.com) (209.85.217.171) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 14 Oct 2012 01:07:05 +0000 Received: by mail-lb0-f171.google.com with SMTP id m4so3416279lbo.16 for ; Sat, 13 Oct 2012 18:06:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=YHaFoH/O5wBTZh24KnSR04qirblqPX0FRPqgdRU7VE8=; b=YOijY7yAvFak0CxySPpuJgbz7gJ1x1QPB2NiH7YSrHmw/1zMSKb4TRE2akzAwWNIrD Qdbfe1AFITkJTLk4bvIsTRJJsVRPge4TINE8irHMpYoUBl9vIqiwVS2q4lCoux5A8MHR iPavkRiwCLFVwifatBDpegRnpkyifBagkZFN1sE3ahqG8zkk7X7RZW6hoLlTeTBeSAAi 9PUQMR32XNhHMwLYUCtX96mKwnTQUKNDQ0hEd5Verwzrca0MDL2ZuwI804qD8JTbEVgi 4eJmvWAA4Kd0AV/o1x43L+5oxGZlHYFx1+TV48JwkJq2WwxV+APZxuPOaoTunAaOC1tZ 6uNw== MIME-Version: 1.0 Received: by 10.112.100.101 with SMTP id ex5mr2943696lbb.20.1350176803365; Sat, 13 Oct 2012 18:06:43 -0700 (PDT) Received: by 10.114.3.165 with HTTP; Sat, 13 Oct 2012 18:06:43 -0700 (PDT) In-Reply-To: References: <507831dd.4255420a.30b0.ffffbc49@mx.google.com> Date: Sun, 14 Oct 2012 09:06:43 +0800 Message-ID: Subject: Re: Extremely slow checkout on a large repository From: Bai Wei To: Andy Levy Cc: users@subversion.apache.org Content-Type: multipart/alternative; boundary=14dae9d2f3b8bfa26904cbfa88d4 X-Virus-Checked: Checked by ClamAV on apache.org --14dae9d2f3b8bfa26904cbfa88d4 Content-Type: text/plain; charset=ISO-8859-1 Hi Andy, Thank you for your prompt reply. I found the performance problem was caused by 21406 files inside a single folder, total size of this folder is 255M. (sorry for this bad layout, but this is not in my control) I've done a little more test over this folder today, and found if we launch svnserve with "--memory-cache-size=8192 --cache-txdeltas yes --cache-fulltexts yes", the performance is much better. Here is my test environment: Server side: svn version 1.7.7 (r1393599) model name : Intel(R) Xeon(R) CPU L5410 @ 2.33GHz /dev/sdb on /export type xfs (rw) total used free shared buffers cached Mem: 16467548 15698252 769296 0 256 13319084 Client side: svn version 1.6.11 (r934486) model name : Intel(R) Xeon(R) CPU E5520 @ 2.27GHz tmpfs on /root/tmp type tmpfs (rw,size=4g) total used free shared buffers cached Mem: 12297388 9509928 2787460 0 663260 7344808 svn server is running on a SSD disk, and I'm checking out to tmpfs, free memory is enough, so we are CPU-bound in these tests. Case 1: If we launch svnserve with default option, it takes 23 mins to checkout this folder: /usr/bin/time svn co svn://10.68.xx.xx/data/task Checked out revision 89151. 185.41user 261.34system 23:00.37elapsed 32%CPU (0avgtext+0avgdata 449760maxresident)k 0inputs+0outputs (0major+28236minor)pagefaults 0swaps PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 29149 root 20 0 162m 38m 1292 R 100.0 0.2 0:24.98 svnserve -d -r /export/svn/repositories/ PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 21920 root 20 0 206m 7348 2908 S 3.4 0.1 0:01.46 svn In this case bottleneck is server CPU. Case 2: With cache options, it takes only 8 mins. /usr/bin/time svn co svn://10.68.xx.xx/data/task Checked out revision 89151. 181.65user 253.39system 8:07.51elapsed 89%CPU (0avgtext+0avgdata 449744maxresident)k 0inputs+0outputs (0major+28235minor)pagefaults 0swaps PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 29128 root 20 0 8312m 548m 1300 S 0.7 3.4 0:02.07 svnserve -d -r /export/svn/repositories/ --memory-cache-size=8192 --cache-txdeltas yes --cache-fulltexts yes PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 21793 root 20 0 268m 59m 2908 R 89.1 0.5 3:36.61 svn CPU usage on server side decreased dramatically when cache options is on, memory cost is acceptable in my case , Client CPU slowdown the checkout process this time. I also tested to checkout this folder on server side machine, with svn 1.7.7, to tmpfs, it takes only 16 seconds, this significant improvement must be attribute to new working copy library (WC-NG) in svn 1.7? Thank SVN dev team for your continuous efforts in optimization. BTW. it's still painfully slow to checkout this repository to hard disk, if disk IO will be optimized in future version of svn , it will be a good news. 2012/10/12 Andy Levy > On Fri, Oct 12, 2012 at 11:06 AM, Wei Bai wrote: > > Hi, Thanks for replying. > > > > > > > > I'm using svnserve 1.7.5 on a Dell R610 server ( Xeon E5620*2/16GB) > > > > The server is running CentOS 5.5, SSD disk is used to increase IO > > performance. > > > > The repository is very large: 100K+ files, 100K+ revisions, total size of > > the work copy is about 1.5G. > > > > > > > > When I what to check out a new work copy on another linux machine, I > found > > it's painfully > > > > slow, it takes about 2 hours. Can anybody tell me if this speed is > normal? > > > > I noticed when the svn checkout command is running on client side, there > is > > a svnserve process with 100% CPU usage on server side, > > > > does this means the concurrent performance of svn is very bad for > checkout > > command? > > Not necessarily. In my environment, I typically hit I/O constraints > limiting my checkout performance long before I hit CPU constraints. > > If you're performing a checkout and seeing 100% CPU utilization on the > server, you're CPU-bound. With a faster processor, you may see > improved checkout performance. With an SSD (assuming the repository is > on the SSD), you're probably not I/O bound. > > How's your memory utilization while checking out? Is it possible that > you're memory constrained and swapping out to disk? > > Because of how Subversion stores revisions, it must look at past > revisions to construct the revision you're requesting. Depending upon > a number of factors, this may become CPU and/or memory bound. It's a > trade-off of performance vs. storage efficiency, and there is no one > optimal setting for everyone's repository. The developers have > selected a value which works well enough for most people. > > > And this svnserve process will not disappear immediately if I kill the > > client side svn process, it will run for a long time with 100% CPU usage, > > this might be a problem? > > It depends on how you're killing the client. If the server doesn't > realize that the client has terminated, it'll keep trying to perform > the checkout. > > > Could anyone give some advice on how to optimize the performance of svn > on a > > large repository? > > First you need to determine your limiting factor. In this case, it > looks like CPU (or memory, once you look into that). > --14dae9d2f3b8bfa26904cbfa88d4 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hi Andy, Thank you for your prompt re= ply.

I found the performance pro= blem was caused by 21406 files inside a single folder, total size of this f= older
is 255M. (sorry for this bad layout, = but this is not in my control)

I've done a little more= test over this folder today, and found if we=A0
launch svnserve with "--memory-cache-size=3D8192 --cache-txdeltas yes = --cache-fulltexts yes", the performance is much better.

Here is my test environment= :
Server side:
svn=A0version 1.7.7= (r1393599)
model name =A0 =A0 =A0: Intel(R) Xeon(R) CPU =A0 =A0 =A0 =A0 =A0 L5410 =A0@= 2.33GHz
/dev/sdb on /export typ= e xfs (rw) =A0
=A0 =A0 =A0 =A0 =A0 =A0 =A0total =A0 = =A0 =A0 used =A0 =A0 =A0 free =A0 =A0 shared =A0 =A0buffers =A0 =A0 cached<= /div>
Mem: =A0 =A0 =A016467548 =A0 15698252 =A0 =A0 769296 =A0 =A0 =A0 =A0 =A00 = =A0 =A0 =A0 =A0256 =A0 13319084
=
Client side:
svn version 1.6.11 = (r934486)
model name =A0 =A0 =A0: Intel(R) Xeon(R) CPU =A0 =A0 =A0 =A0 =A0 E5520 =A0@= 2.27GHz
tmpfs on /root/tmp type= tmpfs (rw,size=3D4g)
=A0 =A0 =A0 =A0 =A0 =A0 =A0total =A0 = =A0 =A0 used =A0 =A0 =A0 free =A0 =A0 shared =A0 =A0buffers =A0 =A0 cached<= /div>
Mem: =A0 =A0 =A012297388 =A0 =A09509928 =A0 =A02787460 =A0 =A0 =A0 =A0 =A00= =A0 =A0 663260 =A0 =A07344808
<= br>
svn server is running on a SSD disk, and I'm checking out to tmpfs,=A0<= /div>
free memory is enough, so we are= CPU-bound in these tests.

Case 1:
If we launch svnserve wi= th default option, it takes 23 mins to checkout this folder:
/usr/bin/time svn co svn://10.68.xx.xx/data/task
Checked out revision 89151.
185.41user 261.34system 23:00.37elapsed 32%CPU (0avgtext+0avgdata 449760max= resident)k
0inputs+0outputs (0ma= jor+28236minor)pagefaults 0swaps

=A0 PID USER =A0 =A0 =A0PR =A0NI =A0VIRT =A0RES =A0SHR S %CPU %MEM =A0 =A0T= IME+ =A0COMMAND =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0
29149 root =A0 =A0 =A020 =A0 0 =A0162m =A038m 1292 R 100.0 0.2 =A0 0:24.98 = svnserve -d -r /export/svn/repositories/ =A0

=A0 PID USER =A0 =A0 =A0PR =A0N= I =A0VIRT =A0RES =A0SHR S %CPU %MEM =A0 =A0TIME+ =A0COMMAND =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0
21920 root =A0 =A0 =A020 =A0 0 =A0206= m 7348 2908 S =A03.4 =A00.1 =A0 0:01.46 svn =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=A0

In this case bottleneck is server CPU.

Case 2:
With cache options, it t= akes only 8 mins.
/usr/bin/time svn co svn://10.68.xx.xx/data/task
Checked out revision 89151.
181.65user 253.39system 8:07.51elapsed 89%CPU (0avgtext+0avgdata 449744maxr= esident)k
0inputs+0outputs (0maj= or+28235minor)pagefaults 0swaps

=A0 PID USER =A0 =A0 =A0PR =A0NI =A0VIRT =A0RES =A0SHR S %CPU %MEM =A0 =A0T= IME+ =A0COMMAND =A0 =A0 =A0=A0
2= 9128 root =A0 =A0 =A020 =A0 0 8312m 548m 1300 S =A00.7 =A03.4 =A0 0:02.07 s= vnserve -d -r /export/svn/repositories/ --memory-cache-size=3D8192 --cache-= txdeltas yes --cache-fulltexts yes =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0=A0

=A0 PID USER =A0 =A0 =A0PR =A0NI =A0VIRT =A0RES =A0SHR S %CPU %MEM =A0 =A0T= IME+ =A0COMMAND =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0
21793 root =A0 =A0 =A020 =A0 0 =A0268m =A059m 2908 R 89.1 =A00.5 =A0 3:36.6= 1 svn=A0

CPU usage on server side decreased dramatically when cache options is on, m= emory cost is=A0acceptable=A0in my case
, Client CPU slowdown the checkout process this time.

I also tested to checkout this folder on server side machine, with svn 1.7.= 7, to tmpfs,=A0
it takes only 16= seconds, this significant=A0improvement=A0must be attribute to =A0new work= ing copy library (WC-NG) in svn 1.7?=A0
Thank SVN dev team for your continuou= s efforts in optimization.

BTW. it's still painful= ly slow to checkout this repository=A0to hard disk,=A0
if disk IO will be optimized in future version of svn , it will be a good n= ews.

2012/10/12 Andy Levy <andy.levy= @gmail.com>
On Fri, Oct 12, 2012 at 11= :06 AM, Wei Bai <baiwei.cn@gmail.= com> wrote:
> Hi, Thanks for replying.
>
>
>
> I'm using svnserve 1.7.5 on a Dell R610 server ( Xeon E5620*2/16GB= )
>
> The server is running CentOS 5.5, SSD disk is used to increase IO
> performance.
>
> The repository is very large: 100K+ files, 100K+ revisions, total size= of
> the work copy is about 1.5G.
>
>
>
> When I what to check out a new work copy on another linux machine, I f= ound
> it's painfully
>
> slow, it takes about 2 hours. Can anybody tell me if this speed is nor= mal?
>
> I noticed when the svn checkout command is running on client side, the= re is
> a svnserve process with 100% CPU usage on server side,
>
> does this means the concurrent performance of svn is very bad for chec= kout
> command?

Not necessarily. In my environment, I typically hit I/O constraints limiting my checkout performance long before I hit CPU constraints.

If you're performing a checkout and seeing 100% CPU utilization on the<= br> server, you're CPU-bound. With a faster processor, you may see
improved checkout performance. With an SSD (assuming the repository is
on the SSD), you're probably not I/O bound.

How's your memory utilization while checking out? Is it possible that you're memory constrained and swapping out to disk?

Because of how Subversion stores revisions, it must look at past
revisions to construct the revision you're requesting. Depending upon a number of factors, this may become CPU and/or memory bound. It's a trade-off of performance vs. storage efficiency, and there is no one
optimal setting for everyone's repository. The developers have
selected a value which works well enough for most people.

> And this svnserve process will not disappear immediately if I kill the=
> client side svn process, it will run for a long time with 100% CPU usa= ge,
> this might be a problem?

It depends on how you're killing the client. If the server doesn&= #39;t
realize that the client has terminated, it'll keep trying to perform the checkout.

> Could anyone give some advice on how to optimize the performance of sv= n on a
> large repository?

First you need to determine your limiting factor. In this case, it looks like CPU (or memory, once you look into that).

--14dae9d2f3b8bfa26904cbfa88d4--