jackrabbit-oak-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jukka Zitting <jukka.zitt...@gmail.com>
Subject Large-scale read benchmarks
Date Tue, 12 Mar 2013 17:26:02 GMT
Hi,

To give us a better picture of the large-scale read performance of Oak
with various backends, I wrote a simple ReadMany benchmark class in
oak-run.

The benchmark creates a content structure that consists of a million
documents organised in 1000x1000 nodes. Each document can be either an
empty node (with only jcr:primaryType set), an nt:file with 10kB of
random data, or a set of 10 child nodes each containing a 1kB string
of random text. Once this content structure is initialised (which
currently takes quite a while), the benchmark measures the time it
takes to read a thousand such documents, either selected randomly
across all documents or traversed linearly from one randomly selected
subtree of 1000 documents. I'm also thinking of adding a read pattern
where the reads are distributed according to a power law. It's also
possible to scale out the benchmark from one to N million documents.

Here are initial numbers from these benchmarks, executed against the
SegmentMK on a m1.medium EC2 instance with a local MongoDB backend:

    Apache Jackrabbit Oak 0.7-SNAPSHOT
    # UniformReadEmpty               min     10%     50%     90%     max       N
    Oak-Segment                       90      96     103     175    2079     413
    # LinearReadEmpty               min     10%     50%     90%     max       N
    Oak-Segment                        4       4       4      12    1752    5904

For now I only ran the test only with empty nodes as the documents and
with segment cache size set to 100MB. The results look pretty good
since interestingly the *entire* repository with those million nodes
fits within that cache (the high max access times are probably due to
initial cache misses). The Amazon monitoring tools show CPU as the
bottleneck of the test.

I'll be running the benchmark also with different document types and
against current Jackrabbit, other Oak backends and smaller segment
cache sizes to give us a better picture of relative performance.
Unfortunately my first attempts at running the test with MongoMK ran
into a NullPointerException from KernelNodeState and with Jackrabbit
into an OutOfMemoryError.

BR,

Jukka Zitting

Mime
View raw message