jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Mueller <muel...@adobe.com>
Subject Re: [jr3] Sequential Node Ids
Date Thu, 23 Dec 2010 14:57:15 GMT
Hi,

I agree this is not a real world test. I will try to create one. It's true, with many CRUD
operations applied randomly over the whole repository the difference is much smaller, but
I doubt that this would be a real world use case. I would expect some kind of clustered access
(create a few nodes at a time, so that so nodes that are created within the same transaction
are also often accessed at a similar time later on).

I made another test with 10 million nodes and Apache Derby. In the diagrams below, higher
is better. This time I measured writing, reading, and additionally purging the buffer cache
(using "purge" on Mac OS) and then reading. The lines show the cumulative number of operations
per second (I know, I should have used local number of operations per second, my fault). The
weird "L" in the first diagram means there was some kind of pause after about 3.6 million
rows, which I can't explain (maybe my computer froze for a few seconds, or maybe a Derby issue).
It doesn't mean the database got slower from then on, it only means there was a pause then.

When appending nodes, using random node ids (like we do now) is a clear disadvantage (when
using a regular database to store data). The larger the repository, the worse. When using
sequential node ids, performance stays the same. This alone would be enough reason to use
sequential node ids.

Reading from the repository is slower, much slower in fact if the buffer cache is empty (after
a restart).

I suggest that we support sequential node ids for Jackrabbit 2.x (support both random node
ids and sequential ids), and use sequential ids by default in future versions. We need to
support clustering in some way. A simple solution is to use a factor plus a cluster id by
default (nodeId = nextNodeIdForThisClusterNode++ * clusterNodeCount + clusterNodeId) or a
"next node id" sequence generator (when a cluster node needs a new id, it increased the global
stored sequence generator value by 100 or so).

For Jackrabbit 3.x we might want to provide more features, for example let the persistence
layer plus the session decide what node ids (if at all) to use. And we might want to allow
changing the node ids later on (defragmentation) to improve performance. Or use less node
ids (group nodes together).

Regards,
Thomas



[cid:2EA7FC83-C087-44E8-9C24-35604DCEE675]





On 12/23/10 3:00 PM, "Michael Dürig" <mduerig@apache.org<mailto:mduerig@apache.org>>
wrote:


Hi Thomas,

I think these are interesting graphs and they show that we do not
exploit locality sufficiently when persisting nodes. However, your
results base on the underlying assumption that sequential node ids lead
to better locality than random node ids do. In general this will depends
on the access pattern and the underlying persistence store: While for
the append only case sequential node ids might exhibit good locality,
for large repositories with many random CRUD operations this approach
might eventually degenerate to behavior similar to random node ids.

I suggest to abstract node ids from storage ids: the node id is an
identifier for retrieving nodes from JCR. A storage id is an identifier
which is used by the persistence store to retrieve a certain piece of
data (which could contain a node, part of a node or multiple nodes). A
mapping from nodes ids to storage ids can then be used to tune locality:
i.e. sequential vs. random, optimal block size etc.

Michael

On 22.12.10 9:51, Thomas Mueller wrote:
Currently, Jackrabbit uses random node ids (randomly generated UUIDs).
For Jackrabbit 3, and possibly Jackrabbit 2.x, I propose to use
sequential node ids.

The main advantages for random node ids (the default) are: it's easy to
develop, and merging data from multiple repositories is simple.

The disadvantages for random node ids are performance and space. With
performance, I don't mean that randomly generated UUIDs are slower to
generate (they are). I mean performance to read and write data to disk.
The main performance problem is reading and writing (indexing) random
keys, which is slow because adjacent (nearby) nodes data aren't adjacent
in the index. This slows down index access (read and write) a lot,
specially if the data doesn't fit in the buffer cache (file system cache).

I wrote a simple test case for Jackrabbit trunk. I also patched
Jackrabbit to allow sequential UUIDs. Creating 1 million nodes is about
80% faster when using sequential UUIDs instead of random UUIDs. I don't
have the data yet, but I'm sure the difference is much, much bigger when
reading from larger repositories once the data doesn't fit in RAM (10
million nodes or so). I will expand the test case to support reading.

The following diagram is the operations per second (bigger is better)
after inserting x nodes (0 to 1 million). With sequential UUIDs,
performance gets better and then stays the same (which is expected). For
random UUIDs which are now the default, performance first goes up and
then down. The resulting database (I used H2 for performance) was about
145 MB for sequential UUIDs and 169 MB for random UUIDs (the reason for
this difference is that the b-tree fill factor is better for sequential
UUIDs). All the data still fits in the buffer cache of the file system
without problem.



Something else I noticed when running the test case was that the
Jackrabbit internals are relatively slow. When profiling the test case,
I saw that database access is not an important factor. Creating the
nodes with the JCR API took 285 / 526 seconds, and re-creating the
database from a SQL script took 55 seconds (this includes reading the
200 MB SQL script file and parsing the SQL statements, one insert
statement per row; this is for the random UUIDs). I would expect
Jackrabbit is faster than 55 seconds, because it doesn't have to read
the SQL script from disk and parse statements.

The test case creates 1 million nodes, saves after each 10'000 nodes,
and uses a fan-out of 20 child nodes per node. I disabled the search
index in the repository configuration. Source code:


*import* java.io.File;

*import* javax.jcr.Node;

*import* javax.jcr.Session;

*import* javax.jcr.SimpleCredentials;

*import* org.apache.commons.io.FileUtils;

*import* org.apache.jackrabbit.core.TransientRepository;

*import* org.apache.jackrabbit.core.id.NodeId;


*public* *class* TestLoop {


*long* start;

Profiler prof;


*public* *static* *void* main(String... args) *throws* Exception {

*new* TestLoop().run();

}


*void* startProf() {

// if(prof != null) {

// prof.stopCollecting();

// System.out.println(prof.getTop(3));

// }

// prof = new Profiler();

// prof.interval = 1;

// prof.depth = 32;

// prof.startCollecting();

}


*void* run() *throws* Exception {

System.setProperty("jackrabbit.sequentialUUIDs", "false");

startProf();

FileUtils.deleteDirectory(*new* File("repository"));

FileUtils.copyFile(*new* File("repository copy.xml"),

*new* File("repository.xml"));

TransientRepository rep = *new* TransientRepository();

Session s = rep.login(

*new* SimpleCredentials("admin", "admin".toCharArray()));

*int* nodes = 1000000;

*int* nodesPerFolder = 20;

*int* saveEach = 10000;

System.out.println(*new* NodeId());

*int* levels = (*int*) Math.ceil(Math.log(nodes)

/ Math.log(nodesPerFolder));

start = System.currentTimeMillis();

startProf();

createNodes(s.getRootNode(), 0, nodes,

saveEach, levels, nodesPerFolder);

System.out.println("took: " + (System.currentTimeMillis() - start));

s.logout();

}


*private* *int* createNodes(Node parent, *int* x, *int* nodes,

*int* saveEach, *int* depth, *int* nodesPerFolder)

*throws* Exception {

*for* (*int* i = 0; x < nodes && i < nodesPerFolder; i++) {

*if* (depth > 1) {

Node n = parent.addNode("f" + x, "nt:unstructured");

x = createNodes(n, x, nodes,

saveEach, depth - 1, nodesPerFolder);

} *else* {

parent.addNode("n" + i, "nt:unstructured");

x++;

*if* ((x % saveEach) == 0) {

parent.getSession().save();

*long* t = System.currentTimeMillis() - start;

startProf();

System.out.println((x * 1000L / t) + " op/s at " + x);

}

}

}

*return* x;

}

}








Mime
View raw message