directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emmanuel L├ęcharny <>
Subject Strig vs Byte[] for values in the server : some new ideas
Date Thu, 22 Aug 2013 09:26:14 GMT
Hi guys,

it has been years I'm thinking about using byte[] inside the server for
values. I have tried more than once to get rid of the String, with no
success so far : we are too dependant on Strings to get rid of that
(like, the PrepareStrng method works on String, not on byte[], the very
same for the various comparators, normalizers, syntaxc heckers).

Bottom line, we have to keep the values as Strings.

But is this true for every values ?

In fact, we always store the received attribute's values in two
different format :
- a normalized String (if it's a HR Attribute) which gets normalized
yada yada
- a UP String, which is the value as it has been provided by the user,
and which is left untouched.

Now, consider a add operation, folloxed by a search operation, from a
specific attribute point of vue (say, the 'description' AT)

User add :
description:String ---> API ---> conversion to UTF-8 ---> Server

Server AddHandler :
description:byte[] ---> decoder ---> conversion to String ---> creation
of the normValue ---> storage on disk ---> conversion of upValue and
NormValue to byte[]

User search :
send searchRequest
wait for response

Server SearchHandler :
fetch the entry => deserialize the Up and Norm value of the description
AT (ie, byte[] to String conversion)
entry processing through the interceptors
write the SearchResultEntry ---> conversion of the description AT
UpValue to byte[] (we don't care about the normValue at this point)

User search :
convert the description Up value to String

As we can see, in both operation, we are overdoing : there is no need to
convert the UpValue to a String, as we will do a byte[] -> String ->
byte[] of this UP value in the search. For the Add, it's slightly better
(or less worse) : we can avoid a String--> byte[] conversion when
storing the value.

Making the UpValue a byte[] will save us a lot of wasted CPU, and
probably a bit of space on disk, as a String requires 2 bytes per char
to be serialized.

This is something we have to work on before 2.0, as the underlying
database will be impacted, as we will not serialize the UpValue as a
String but as a byte[].

Thoughts ?

Emmanuel L├ęcharny 

View raw message