perl-modperl mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cees Hek <c...@sitesuite.net>
Subject [OT] Re: Fast DB access
Date Thu, 19 Apr 2001 20:15:20 GMT

On Thu, 19 Apr 2001, Murali V wrote:

> Hi,
> 
> If you read the code more deeply, you'll find that the timeit is only
> wrapped around select and not around insert.
> We've written the insert code so that in the first round you can populate
> the database.
> You comment out the insert code after the first round and run the benchmark
> several times. This would only do select and time select.
> 

Hi Murali,

OK, to start off, I was not specifically aiming my rant at you, I was
replying to someone who had modified your code and was now comparing MySQL
and PostgreSQL, and he was implying that the timings were for inserts and
selects.  I took this at face value, and didn't check the code close
enough which I really should have done in the first place.

> Connecting this error to an axiom that "Benchmarks are useless" is bad
> indeed. Shouldn't we be ironing out errors and runing benchmarks which are
> good.

Perhaps I should have said published benchmarks.  In your case, you are
using benchmarks for exactly what they are intended for...  Creating a
system that closely resembles your application and putting it through it's
paces.  What I find dangerous about publishing benchmarks, is that they
are almost always heavily swayed to a specific application, and most of
the time they show what the user wants them to show.

In your original message, you clain to have a bias against Postgres, and
your benchmark shows that bias.  I however am a happy user of postgres,
and am therefore biased towards it.  I modified your benchmark script
slightly, and I got the following results (I have include a diff of my
changes at the bottom):

postgres
 0 wallclock secs ( 0.02 usr + 0.01 sys = 0.03 CPU) 
postgres
 0 wallclock secs ( 0.02 usr +  0.00 sys =  0.02 CPU)

Whereas if I run it with your version I get the following:

postgres 
27 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
postgres
27 wallclock secs ( 0.02 usr +  0.00 sys =  0.02 CPU)

So what does that tell you about the benchmark?  that the postgres part of
this benchmark is useless...  It may have given you the answer that you
wanted, but it is misleading to anyone else out there.

This is why there are always flame wars about benchmarking databases (by
the way I think this whole thread has been very civilized and i hope is
stays that way).  Invariably the benchmark has missed some critical idea
or optimization which drastically skew the results.

> Your recommendation is to pick a DB best suited to your app. But How ??
> a) Either by hiring a guru who has seen all kinds of apps with different DBs
> who can give you the answer with which we can run
> b) Run a benchmark on critical programs which represent you app across
> databases and find what performs best.
> I've read too much literature on DB features. All DBs have all features
> (except MySQL which does not have commit !!!!)
> You can't make a thing out of DB literature.

What I would recommend is exactly what you have done in this case.  Get
access to any and all the systems that you feel may do the job for you ,
and try them out.  Browse the web for other users experiences, but don't
use other peoples benchmarks, because the odds are good that they are
wrong...  Create your own, or modify an existing one, and scrutinize
exactly what it is doing.  And if you want to share your results with
anyone else, tell them what you choose in the end, and why.  Tell them you
choose database x because it did this and this for you.  Don't say
database y is a piece of crap, so we went with database x.

But whatever you do, don't choose your database based on other peoples
benchmarks........ (that is all I'm trying to say, and I guess I didn't
say it clearly enough)

When I first read your message, I tucked it away somewhere, so I could
reference it again in the future, because I was interested in the MLDBM
work that you had done, and I thank you for that.  But it also made me
think that maybe I shouldn't be using Postgres, because your results were
so poor (only for a second or too though :).  But I'll bet that a lot of
people who have never used postgres before are now less likely to download
it and try it out for themself, because a benchmark swayed them away from
it.  That sounds like a good closer, so I'll stop it there :-)


Cees


Here is the diff of my changes and a quick comment on why your way kills
the performance of postgres

***************
*** 124,131 ****
                $i_ip = int(rand(20));

                @row_ary = $dbh->selectrow_array("select crr from benchmark where
!                               rtrim(pub) = 'pub$i_pub' and rtrim(size) = 'size$i_size' and
!                               rtrim(type) = 'type$i_type' and rtrim(ip) = 'ip$i_ip'");
        }
  };

--- 124,131 ----
                $i_ip = int(rand(20));

                @row_ary = $dbh->selectrow_array("select crr from benchmark where
!                               pub = 'pub$i_pub' and size = 'size$i_size' and
!                               type = 'type$i_type' and ip = 'ip$i_ip'");
        }
  };

All I did was remove the 'rtrim' called from the query.  I'm assuming you
did this because Postgres pads all char fields with spaces (varchar
doesn't do this).  What your query ends up doing, is running rtrim on
every field in the column and then comparing it against your provided
value (indeces are not used since you are changing the value of the
primary key).  So since your table has 24000 rows and you call rtrim on 4
columns in the select, it has a worst case of 24,000 x 4 calls to rtrim
(of course postgres will short circuit the conditions if the first call
fails), but in the best case it still has to do 24,000 calls to rtrim.  
I'm amazed that postgres can actually do this this fast :).

In Postgres 7.0 you don't have to worry about the space padding issue, so
I was able to remove the rtrims from the query.  However since you are
using 6.5.x, what you should have done, is either make your columns
varchar instead of char, or space pad your values in the query, so that
you are checking pub = 'pub1 ' instead of rtrim(pub) = 'pub1'.  i

However, since you were looking at solutions for a new application, you
have no excuse for using a database that is several years old, when
several new stable revisions are available (for free)...

Give it a try and see what happens...





Mime
View raw message