jackrabbit-oak-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lukas Eder <mar09...@adobe.com>
Subject Re: Large flat commit problems
Date Mon, 29 Apr 2013 07:17:53 GMT
Hi,

On 4/26/13 2:15 PM, "Jukka Zitting" <jukka.zitting@gmail.com> wrote:

>Hi,
>
>On Wed, Feb 27, 2013 at 12:24 PM, Jukka Zitting <jukka.zitting@gmail.com>
>wrote:
>>     Added 167000 pages in 467 seconds (2.80ms/page)
>>     Imported 167404 pages in 1799 seconds (10.75ms/page)
>
>Here's an update on the latest status with the Wikipedia import benchmark:
>
>    $ java -Xmx1500m -jar oak-run/target/oak-run-0.7-SNAPSHOT.jar \
>          benchmark --wikipedia=simplewiki-20130414-pages-articles.xml \
>          --cache=200  WikipediaImport Oak-Segment
>    [...]
>    Added 171000 pages in 166 seconds (0.97ms/page)
>    Imported 171382 pages in 355 seconds (2.07ms/page)
>    [...]
>    Traversed 171382 pages in 27 seconds (0.16ms/page)
>
>Pretty good progress here.

Those are impressive numbers. Do comparisons with Jackrabbit exist?

Cheers
Lukas

>> There are still a few problems, most notably the fact the index update
>> hook operates directly on the plain MemoryNodeBuilder used by the
>> current SegmentMK, so it won't benefit from the automatic purging of
>> large change-sets and thus ends up requiring lots of memory during the
>> massive final save() call. Something like a SegmentNodeBuilder with
>> similar internal purge logic like what we already prototyped in
>> KernelNodeState should solve that issue.
>
>This is still an issue, see the -Xmx1500m I used for the import.
>
>> The other big issue is the large amount of time spent processing the
>> commit hooks. The one hook approach I outlined earlier should help us
>> there.
>
>The work we've done here with the Editor mechanism is clearly paying
>off as the commit hooks are now taking some 53% of the import time,
>down from 74% two months ago, even when we've been adding more
>functionality there.
>
>BR,
>
>Jukka Zitting


Mime
View raw message