lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adrien Grand (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LUCENE-7396) Speed up flush of 1-dimension points
Date Tue, 26 Jul 2016 17:54:20 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-7396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Adrien Grand updated LUCENE-7396:
---------------------------------
    Attachment: LUCENE-7396.patch

Here is a patch that uses a different approach. Flush passes a special implementation of a
PointsReader that allows points to be reordered, so that codecs can sort points in the order
that they are interested in. The benefit compared to the previous patch is that it is not
specific to a codec anymore and also that it can be used in the multi-dimensional case. I
got the following flush times (as reported by the IndexWriter log) with a 1GB buffer:

|| Flush time (ms)||master||patch||
|IndexAndSearchOpenStreetMaps1D (1 dim)|31089|18954 ({color:green}-39.0%{color})|
|IndexAndSearchOpenStreetMaps (2 dims)|123461|85235 ({color:green}-30.1%{color})|

This looks encouraging, especially given that it also uses less memory than the current approach.
However the patch is a bit disappointing in that it has a completely different implementation
of the writing of the tree depending on whether the input can be reordered or not. I'll look
into whether I can clean this up a bit.

> Speed up flush of 1-dimension points
> ------------------------------------
>
>                 Key: LUCENE-7396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7396
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-7396.patch, LUCENE-7396.patch
>
>
> 1D points already have an optimized merge implementation which works when points come
in order. So maybe we could make IndexWriter's PointValuesWriter sort before feeding the PointsFormat
and somehow propagate the information to the PointsFormat?
> The benefit is that flushing could directly stream points to disk with little memory
usage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message