jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Mueller (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup
Date Tue, 11 Dec 2018 08:24:00 GMT

    [ https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16716524#comment-16716524
] 

Thomas Mueller commented on OAK-7947:
-------------------------------------

> The changes in ... getIndexDefinition ... not from stored index definition

Yes, I know, this is a bug in the patch. I will fix that.

> the patch you had attached seems quite risky to me

Yes. I didn't plan to apply the patch, it's just the starting point. There are bugs, todos,
and some parts are probably not needed.

Next, I will try to find out which parts are not needed.

> let index open happen as it happens today but copy required files right away (synchronously)
and schedule rest of the files for later.

I'm afraid I would need some help for this. I tried disabling copy-on-read, but then the file
are opened from the datastore, which has some additional problems: files are opened multiple
times. So I came to the conclusion it's best not to open the files until they are really needed
to run queries, and needed to do detailed cost estimation (if the index might be used). So
there are 3 stages (AFAIK):

* Stage 1: just the index definition is needed so see if the properties are indexed.
* Stage 2: numDocs are needed to do cost estimation.
* Stage 3: index is used for a query.

Obviously, for stage 3, the index files are needed. For stage 1, right now the index files
are opened. I think it's sufficient to delay opening the files there, and just use the index
definition. For stage 2, I think (not sure yet) that this is actually rare enough and it's
OK to open all index files. If it turns out this is _not_ that rare, then we can store the
numDocs in the index definition from time to time (in theory we could do that for every index
update). Then store the time of the numDocs update. And when the numDocs are needed, then
either they are read from the index definition (let's say if they are younger than 1 hour
or so), or else open the index files.



> Lazy loading of Lucene index files startup
> ------------------------------------------
>
>                 Key: OAK-7947
>                 URL: https://issues.apache.org/jira/browse/OAK-7947
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: lucene, query
>            Reporter: Thomas Mueller
>            Assignee: Thomas Mueller
>            Priority: Major
>         Attachments: OAK-7947.patch
>
>
> Right now, all Lucene index binaries are loaded on startup (I think when the first query
is run, to do cost calculation). This is a performance problem if the index files are large,
and need to be downloaded from the data store.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message