Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id A7FCA200BD3 for ; Tue, 6 Dec 2016 16:55:00 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id A2F0A160B1B; Tue, 6 Dec 2016 15:55:00 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id ECDAC160B17 for ; Tue, 6 Dec 2016 16:54:59 +0100 (CET) Received: (qmail 72377 invoked by uid 500); 6 Dec 2016 15:54:58 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 72367 invoked by uid 99); 6 Dec 2016 15:54:58 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Dec 2016 15:54:58 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id C628D2C03DE for ; Tue, 6 Dec 2016 15:54:58 +0000 (UTC) Date: Tue, 6 Dec 2016 15:54:58 +0000 (UTC) From: "Uwe Schindler (JIRA)" To: dev@lucene.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (LUCENE-7583) Can we improve OutputStreamIndexOutput's byte buffering when writing each BKD leaf block? MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 06 Dec 2016 15:55:00 -0000 [ https://issues.apache.org/jira/browse/LUCENE-7583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15725878#comment-15725878 ] Uwe Schindler commented on LUCENE-7583: --------------------------------------- I think ByteArrayDataOutput is always a good idea to create "small" blobs of structured data. You have full control of the buffer and there is almost no checks and multi-buffer handling involved. It just writes to an byte array that you can reuse later or write to IndexOutput as block. > Can we improve OutputStreamIndexOutput's byte buffering when writing each BKD leaf block? > ----------------------------------------------------------------------------------------- > > Key: LUCENE-7583 > URL: https://issues.apache.org/jira/browse/LUCENE-7583 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Michael McCandless > Fix For: master (7.0), 6.4 > > Attachments: LUCENE-7583-hardcode-writeVInt.patch, LUCENE-7583.patch > > > When BKD writes its leaf blocks, it's essentially a lot of tiny writes (vint, int, short, etc.), and I've seen deep thread stacks through our IndexOutput impl ({{OutputStreamIndexOutput}}) when pulling hot threads while BKD is writing. > So I tried a small change, to have BKDWriter do its own buffering, by first writing each leaf block into a {{RAMOutputStream}}, and then dumping that (in 1 KB byte[] chunks) to the actual IndexOutput. > This gives a non-trivial reduction (~6%) in the total time for BKD writing + merging time on the 20M NYC taxis nightly benchmark (2 times each): > Trunk, sparse: > - total: 64.691 sec > - total: 64.702 sec > Patch, sparse: > - total: 60.820 sec > - total: 60.965 sec > Trunk dense: > - total: 62.730 sec > - total: 62.383 sec > Patch dense: > - total: 58.805 sec > - total: 58.742 sec > The results seem to be consistent and reproducible. I'm using Java 1.8.0_101 on a fast SSD on Ubuntu 16.04. > It's sort of weird and annoying that this helps so much, because {{OutputStreamIndexOutput}} already uses java's {{BufferedOutputStream}} (default 8 KB buffer) to buffer writes. > [~thetaphi] suggested maybe hotspot is failing to inline/optimize the {{writeByte}} / the call stack just has too many layers. > We could commit this patch (it's trivial) but it'd be nice to understand and fix why buffering writes is somehow costly so any other Lucene codec components that write lots of little things can be improved too. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org