I did some more profiling today. here are some thouhts about what I have
found, and places we can improve the code
While processing a search, we are comparing keys with what we have in
trees over and over. If the BTree has N level and M elements per page,
in order to find an entry, we will do N*log2(M) comparisons. In my test,
for 200 000 searches, we are doing around 4 x 5 comparison per search,
so around 4 millions calls to the compare() method. In fact, way more
than 4 millions :
44 881 820 99384.1 5.9
So the comparator is called plenty of times. Let's have a look at where
it gets called :
o DefaultSearchEngine.computeResult() calls the db.getEntryId( baseDn
This is to check if the baseDn exist. The getEntryId will use the
RdnIndex to chec that. The problem is that this index can only handle
RDNs, and not DN, so depending on how deep is the baseDN (ie, how many
RDN it has), we will have to do many searches in this index.
In any case, we will do many comparisons to find the right Dn
o DefaultOptimizer.annotate() method calls the idx.count() method
Here, we are trying to know how many candiates we will get using one
index. If the AttributeType is singleValued, we will get either 0 (if
the index has the ExprNode value) or 1 (of the index does not have the
ExprNode value). If the AttributeType is multiValued, we will get the
number of associated values for the given key. This is for an
equalityScan, for a <= or >=, the count is computed differently.
Again, we have many comparisons to do here (and this is multiplicated by
the number of index we have in the filter).
At some point, the cost of doing all those comparisons largely exceed
the cost of fetching useless candidates from the MasterTable (we are
spending 8 more times fetching data from the indexes than grabing the
entries from the masterTable)
How can we improve this ?
o The first step is tryng to retrieve the entryUUID associated with the
baseDN. It's likely to be done many times, so we could benefit from
using a <Dn, entryUUID> cache, to avoid those costly operations (keep in
mind that we not only do comparisons, we also potentially fetch pages
from the disk if they are not present in memory). A MRU cache will save
us a lot of time here.
o At this point, we compare normalized values with normalized values.
That means we don't have to normalize the values again. Sadly, this is
what we do :
PrepareString.insignifiantSpacesString(String, boolean) line: 4803
PrepareString.normalize(String, PrepareString$StringType) line: 244
DeepTrimToLowerNormalizer.normalize(String) line: 103
CachingNormalizer.normalize(String) line: 124
String) line: 76
Object) line: 36
SerializableComparator<E>.compare(E, E) line: 86
Node<K,V>(AbstractPage<K,V>).compare(K, K) line: 254
Node<K,V>(AbstractPage<K,V>).findPos(K) line: 189
Node<K,V>.getValues(K) line: 858
BTree<K,V>.getValues(K) line: 962
MavibotTable<K,V>.count(K) line: 526
MavibotIndex<K,V>.count(K) line: 260
DefaultOptimizer<E>.getEqualityScan(SimpleNode<V>) line: 304
DefaultOptimizer<E>.annotate(ExprNode) line: 148
DefaultOptimizer<E>.getConjunctionScan(BranchNode) line: 245
DefaultOptimizer<E>.annotate(ExprNode) line: 185
SearchOperationContext) line: 220
As we can see, we are using the AT comparator (here, this is for 'cn')
which will normalize the value, which is already normalized by the
NormalizerInterceptor (for the filter and for the stored value). There
is no need to normalize neither the key we are looking for nor the
stored key. For the record, the prepareString.insignifiantSpacesString()
is extremely voracious :
12046499 303 480 18%
The first number is the number of time the method is called, the second
the time it takes to execute, and the third number the percentage of
global CPU it takes.
Now, it's a bit hard to get rid of this computation : the comparator is
associated with the AttributeType, and we don't have one which does not
normalize the value for the server, and another one that normalize the
values for the client (keep in mind that the SchemaManager is used on
the client and the server). So how do we distinguish the use case we are
in ? Tht's the key ; if we are in the server, we should not normalize,
and if we are on the client, we must normalize...
Atm, I have no idea on how to do that. The only thing is that it's
extremely critical to avoid this extra computation.
2) Annotate() and build() methods
Those two methods are called by the computeResult() method. They are
approximately as expensive. They mainly do the same thing :
- find the number of candidate
- construct a set of entryUUID based on the smallest index.
In the test I'm running the filter is like (cn=entryNNN) where NNN is a
random number. Basically, we are spending as much time to find the
number of candidate (1) in the annotate() method, and to fetch him in
the build() method.
How can we improve this ?
It's not simple. One can think that an equality filter wil return only a
few values, or even 1 or 0. this is true for the kind of filter I'm
using n this test, but it's not true for a filter like
(ObjectClass=person), which can select thousands of candidates.
One possible way would be to get rid of the count() method, and to fetch
the candidate immediately instead, up to a number of candidate (100 ?).
The fetched candidate will be stored associated with the annotated node,
and can be used immediately by the build() method, instead of being
feteched a second time.
For single valued AttributeType, it's even simpler : either the index
contains the value, or not, but in any case the count will be zero or
one. In this case, it's obvious that we must fetch the entryUUID
immediately and store it with the ExprNode, as it will be selected as
the smallest index.
For multi-valued AttributeType, as the value is stored in a sub-btree,
we can decide to fetch the set of candidate if this sub-btree contains
less than a number of candidate, or store the number of candidates.
Implementing those two improvements is not a big deal, and the gain will
I'm not done with the analysis of the profiling session, but if we can
work on the two items I have mentionned in this mail, we would get
already a great deal of improvement. Once done, we can conduct another
profiling session to find which part need to be worked out.