db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nathan Boy <nathan....@gmail.com>
Subject Re: Improving data insert performance
Date Thu, 14 Jan 2010 20:23:31 GMT
Hi Bryan,
Thanks for replying.  I'll try to answer some of your questions point by point:

- have you tried using in-memory databases rather than test duration?
I have done some testing with in-memory databases, but I get virtually
identical results.  This leads me to believe that my bottleneck is
probably not disk i/o.

- have you experimented with using a larger page size to minimize
  per-page overhead?
Yes.  Right now I am using a page size of 32768, which appears to
offer a modest improvement over 4096.  Sizes larger than that don't
seem to much of an effect.

- have you experimented with giving a larger-than-default page cache?
Yes.  Right now I am using a page cache size of 2000.  Again, sizes
larger than that don't seem to have much of an effect.

- is your system CPU-bound? is there any disk io occurring at all?
Yes, it appears to be CPU-bound.  I'm not sure how much disk i/o is
really occurring, but we are definitely using up all available cpu
time.  Unfortunately we only use all the CPU time on a single core.

- how much garbage are you generating, what is the GC profile like?
Not too much.  Garbage collections take up about 8 seconds over the
course of 5 minutes of work.  The heap size is less than the maximum
allocated, so I am not thrashing. (I have had issues where index
generation uses an amazing amount of memory, but that is a story for
another day.)

- what rev of the JDK are you running, and on what operating system?
Sun JDK 6u16, on Ubuntu 9.10.  I am using Derby 10.5.3.0.

- is your system multi-core? Can you throw threads/cores at this problem?
Yes.  I have been doing my work on a 1.86 Ghz Core 2 and, more
recently, on a 2.8 Ghz i7.  I haven't had much luck throwing more
threads/cores at the problem, however.  I have tried to run some tests
where, instead of just prepping a file into a single table, I run
several threads that each prep the file into a different table.
Unfortunately, I don't actually get more throughput this way, and
while my threads are clearly running on different cores, I still only
end up using a single core's worth of cpu.  I assume this means that
somewhere underneath the hood everything is being serialized.

- can you get any sort of profiling data about where the hotspots
  are during your run?
Well, YourKit has helped me a bit in this regard.  When I am executing
prepared statements in batches, around 90% of the program time is
spent inside the EmbedStatement.executeBatch method.  Unfortunately I
don't know exactly what it is doing in there.  Similarly, I can see
that when I am inserting everything using a table function ("INSERT
INTO MYTABLE (...) SELECT S.* FROM TABLE (MYFUNC ()) S") , about 90%
of the program is spent inside the method EmbedStatement.execute, of
which only 5% of that time is spent getting information from the
parser.  Again, I don't know where the rest goes.

Anyway, if you or anyone else has advice on what else I can do to
improve my performance, I'd love to hear it.  Thanks again for your
time.

Cheers,
Nathan

On Wed, Jan 13, 2010 at 12:30 PM, Bryan Pendleton
<bpendleton@amberpoint.com> wrote:
>> using prepared statements executed in batches, and this gives me
>> between 10 and 20 row inserts per millisecond on average.  I have
>
> Wow! You're already going very fast, and you've already put a lot
> of good effort into this, so I congratulate you on how far you've
> gone already.
>
> A couple of thoughts occurred to me as I read your post:
>
>  - have you tried using in-memory databases rather than test duration?
>  - have you experimented with using a larger page size to minimize
>   per-page overhead?
>  - have you experimented with giving a larger-than-default page cache?
>
> Also, have you done any system-level analysis of your application
> during the insert:
>  - is your system CPU-bound? is there any disk io occurring at all?
>  - how much garbage are you generating, what is the GC profile like?
>  - what rev of the JDK are you running, and on what operating system?
>  - is your system multi-core? Can you throw threads/cores at this problem?
>  - can you get any sort of profiling data about where the hotspots
>   are during your run?
>
> Good luck with your performance work, it sounds extremely interesting!
>
> thanks,
>
> bryan
>
>

Mime
View raw message