lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adrien Grand (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-8635) Lazy loading Lucene FST offheap using mmap
Date Fri, 18 Jan 2019 18:51:00 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16746576#comment-16746576
] 

Adrien Grand commented on LUCENE-8635:
--------------------------------------

bq. The PK lookup doesn't concern me much since such queries would usually already be fast
and overall a tiny fraction of a search platform in typical usage.

For the record, Lucene also performs implicit PK lookups when indexing with updateDocument.
So this might have an impact on indexing speed as well.

bq. I think net/net we are already relying on OS to do the right thing here.  As things stand
today, the OS could also swap out the heap pages that hold the FST's byte[] depending on its
swappiness

Most deployments I am aware of tune swappiness to avoid this situation. :)

Don't get me wrong, I'm very much in favor of this change. I agree it's a bit unlikely that
the terms index gets paged out, but you can still end up with a cold FS cache eg. when the
host restarts?

Furthermore the NIO and Simple FS directories use buffering. I'm wondering how bad things
would be if every seek would need to reload the buffer? You mentioned bringing back pack()
with 2) only, maybe reordering nodes would still be useful so that we could optimize the likeliness
that two connected nodes of the FST would be in the same buffer (or maybe the current way
of building FSTs is already good from that perspective?)? Even if that made the FST a bit
larger that would still probably be a good trade-off now that we are considering keeping the
FST on disk?

> Lazy loading Lucene FST offheap using mmap
> ------------------------------------------
>
>                 Key: LUCENE-8635
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8635
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: core/FSTs
>         Environment: I used below setup for es_rally tests:
> single node i3.xlarge running ES 6.5
> es_rally was running on another i3.xlarge instance
>            Reporter: Ankit Jain
>            Priority: Major
>         Attachments: offheap.patch, ra.patch, rally_benchmark.xlsx
>
>
> Currently, FST loads all the terms into heap memory during index open. This causes frequent
JVM OOM issues if the term size gets big. A better way of doing this will be to lazily load
FST using mmap. That ensures only the required terms get loaded into memory.
>  
> Lucene can expose API for providing list of fields to load terms offheap. I'm planning
to take following approach for this:
>  # Add a boolean property fstOffHeap in FieldInfo
>  # Pass list of offheap fields to lucene during index open (ALL can be special keyword
for loading ALL fields offheap)
>  # Initialize the fstOffHeap property during lucene index open
>  # FieldReader invokes default FST constructor or OffHeap constructor based on fstOffHeap
field
>  
> I created a patch (that loads all fields offheap), did some benchmarks using es_rally
and results look good.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message