directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kiran Ayyagari <kayyag...@apache.org>
Subject Re: Bulk load profiling
Date Sun, 22 Jun 2014 17:07:59 GMT
On Sun, Jun 22, 2014 at 8:56 PM, Emmanuel L├ęcharny <elecharny@gmail.com>
wrote:

> Hi Kiran,
>
> I did a bit of profiling today, and was able to improve the perfs by 7%.
> The method I speeded up is PrepareString. I created a specific method
> which does not crerate a new char[] when we are dealing with ASCII chars
> only. The gain is huge.
>
great, can you commit it?

>
> Otherwise, most of the time is -as expected- spent in the
> deserialization of entries read from the MasterTable.
>
> ok

> At this point, I think we should think about what we can do to avoid
> such cost. Most of the time, we will have enough memory to load all the
> elements that will be stored into an index. I'm wondering if it would
> not be better to parse the LDIF once, gather what we can in memory (but
> not keeping the whole entry in memory) and build the index directly,
> then process the master table.
>
> hmm, at least at one point we end up with keeping full entry

> It's not easy, because we can't know how much elements we can store in
>
yeah

> memory, and when we reach the memory limit, then we have to do something
> which is completely different. If we decide to deal with the memory
> limitation from the beginning, we will pay the price and it will be
> expensive. OTOH, most of the time we won't have to care about the memory
>
yep

> for two reasons :
> - either we have to deal with a limited number of entries in the ldif file
> - or we have enough memory to handle the whole file (on my computer, I
> can provide 14Gb to the JVM, enough to process 5M entries if each one of
> them is 1kb large)
>
> I'm now thinking that it would be better to have 2 possible algorithm :
> - a in-memory one, which does not care aboyt what could happen when we
> reach the end of the memory
> - a 'smarter' one which take control when we get an OOM
>
> +1

> This can be done the same way we do with the DN parser : we have a fast
> parser, which throw an exception if it sees a special case, and a full
> parser. Same here, but we catch the OOM instead.
>
> Of course, we cna probably try to 'predict' which one to use when we
> start the bulk load, to avoid spending time with the in-memory process.
> Or we can let the user decide.
>
> Wdyt ?
>
yep, been thinking about the earlier ideas as well, but for now just moved
the bulkloader to its own module

-- 
Kiran Ayyagari
http://keydap.com

Mime
View raw message