Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id BBCB5200B5E for ; Tue, 26 Jul 2016 19:54:22 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id BA73C160A69; Tue, 26 Jul 2016 17:54:22 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 119DB160A7D for ; Tue, 26 Jul 2016 19:54:21 +0200 (CEST) Received: (qmail 58621 invoked by uid 500); 26 Jul 2016 17:54:21 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 58341 invoked by uid 99); 26 Jul 2016 17:54:20 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Jul 2016 17:54:20 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id BDDF72C0DA8 for ; Tue, 26 Jul 2016 17:54:20 +0000 (UTC) Date: Tue, 26 Jul 2016 17:54:20 +0000 (UTC) From: "Adrien Grand (JIRA)" To: dev@lucene.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (LUCENE-7396) Speed up flush of 1-dimension points MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 26 Jul 2016 17:54:22 -0000 [ https://issues.apache.org/jira/browse/LUCENE-7396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand updated LUCENE-7396: --------------------------------- Attachment: LUCENE-7396.patch Here is a patch that uses a different approach. Flush passes a special implementation of a PointsReader that allows points to be reordered, so that codecs can sort points in the order that they are interested in. The benefit compared to the previous patch is that it is not specific to a codec anymore and also that it can be used in the multi-dimensional case. I got the following flush times (as reported by the IndexWriter log) with a 1GB buffer: || Flush time (ms)||master||patch|| |IndexAndSearchOpenStreetMaps1D (1 dim)|31089|18954 ({color:green}-39.0%{color})| |IndexAndSearchOpenStreetMaps (2 dims)|123461|85235 ({color:green}-30.1%{color})| This looks encouraging, especially given that it also uses less memory than the current approach. However the patch is a bit disappointing in that it has a completely different implementation of the writing of the tree depending on whether the input can be reordered or not. I'll look into whether I can clean this up a bit. > Speed up flush of 1-dimension points > ------------------------------------ > > Key: LUCENE-7396 > URL: https://issues.apache.org/jira/browse/LUCENE-7396 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Adrien Grand > Priority: Minor > Attachments: LUCENE-7396.patch, LUCENE-7396.patch > > > 1D points already have an optimized merge implementation which works when points come in order. So maybe we could make IndexWriter's PointValuesWriter sort before feeding the PointsFormat and somehow propagate the information to the PointsFormat? > The benefit is that flushing could directly stream points to disk with little memory usage. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org