Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 8440 invoked from network); 1 Feb 2010 21:20:43 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 1 Feb 2010 21:20:43 -0000 Received: (qmail 51573 invoked by uid 500); 1 Feb 2010 21:20:42 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 51506 invoked by uid 500); 1 Feb 2010 21:20:42 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 51498 invoked by uid 99); 1 Feb 2010 21:20:42 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Feb 2010 21:20:42 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Feb 2010 21:20:40 +0000 Received: from brutus.apache.org (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id E13F1234C4AD for ; Mon, 1 Feb 2010 13:20:18 -0800 (PST) Message-ID: <119122372.5551265059218921.JavaMail.jira@brutus.apache.org> Date: Mon, 1 Feb 2010 21:20:18 +0000 (UTC) From: "Chris Harris (JIRA)" To: java-dev@lucene.apache.org Subject: [jira] Issue Comment Edited: (LUCENE-2232) Use VShort to encode positions In-Reply-To: <1579791409.9811264333217812.JavaMail.jira@brutus.apache.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/LUCENE-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12828210#action_12828210 ] Chris Harris edited comment on LUCENE-2232 at 2/1/10 9:19 PM: -------------------------------------------------------------- I have a little bit of sampling profiling data from YourKit that may be relevant. (Paul encouraged me to post anyway.) Note that the queries submitted were not limited to those requiring PRX data, although some of them (30%? 40%?) did. This data is _without_ applying this LUCENE-2232 patch. YourKit was set to time java.io.RandomAccessFile.readBytes and .read with wall clock time. 1. I replayed about 1000 queries taken from our user query logs on a test system that uses rotating drives, without first submitting any battery of warmup queries. {code} SegmentTermPositions.readDeltaPosition() IndexInput.readVInt() <---------- {code} I looked at the time spent in the marked call to IndexInput.readVInt(). 93% of the time in this readVint() was spent in I/O, leaving a maximum of 7% that could theoretically be wasted on the CPU decoding VInts. 2. I profiled one of our live Solr servers that uses SSD drives, after the system had warmed up a bit. Here is the resulting profiling data, with times relative to SegmentTermPositions.readDeltaPosition(): {code} SegmentTermPositions.readDeltaPosition() - 100% IndexInput.readVInt - 100% BufferedIndexInput.readByte - 69% BufferedIndexInput.refill - 69% SimpleFSDirectory$SimpleFSIndexInput.readInternal - 69% java.io.RandomAccessFile.read - 55% java.io.RandomAccessFile.seek - 14% {code} Here we have a healthier 31% of the time that could potentially be sped up by this patch. It partly depends on how much the patch would increase I/O, though. (I guess the hope is that it wouldn't increase I/O by too crazy amount if your documents are above a certain size.) *UPDATE*: For context on run #2, * 1,864,186ms total spent under solr.search.SolrIndexSearcher.search * 1,633,612ms total spent under lucene.search.IndexSearcher.search * 571,254ms total spent under lucene.index.SegmentTermPositions.readDeltaPositions ** Of this, about 18,500ms were from SegmentMerger.appendPostings, rather than from searches/highlighting * 1,330,565ms total spent under IndexInput.readVInt(). (These are all "time with children", rather than "own time".) was (Author: ryguasu): I have a little bit of sampling profiling data from YourKit that may be relevant. (Paul encouraged me to post anyway.) Note that the queries submitted were not limited to those requiring PRX data, although some of them (30%? 40%?) did. This data is _without_ applying this LUCENE-2232 patch. YourKit was set to time java.io.RandomAccessFile.readBytes and .read with wall clock time. 1. I replayed about 1000 queries taken from our user query logs on a test system that uses rotating drives, without first submitting any battery of warmup queries. {code} SegmentTermPositions.readDeltaPosition() IndexInput.readVInt() <---------- {code} I looked at the time spent in the marked call to IndexInput.readVInt(). 93% of the time in this readVint() was spent in I/O, leaving a maximum of 7% that could theoretically be wasted on the CPU decoding VInts. 2. I profiled one of our live Solr servers that uses SSD drives, after the system had warmed up a bit. Here is the resulting profiling data, with times relative to SegmentTermPositions.readDeltaPosition(): {code} SegmentTermPositions.readDeltaPosition() - 100% IndexInput.readVInt - 100% BufferedIndexInput.readByte - 69% BufferedIndexInput.refill - 69% SimpleFSDirectory$SimpleFSIndexInput.readInternal - 69% java.io.RandomAccessFile.read - 55% java.io.RandomAccessFile.seek - 14% {code} Here we have a healthier 31% of the time that could potentially be sped up by this patch. It partly depends on how much the patch would increase I/O, though. (I guess the hope is that it wouldn't increase I/O by too crazy amount if your documents are above a certain size.) *UPDATE*: For context on run #2, * 1,864,186ms total spent under solr.search.SolrIndexSearcher.search * 1,633,612ms total spent under lucene.search.IndexSearcher.search * 571,254ms total spent under lucene.index.SegmentTermPositions.readDeltaPositions ** Of this, about 18,500ms were from SegmentMerger.appendPostings, rather than from searches/highlighting * 1,330,565ms total spent in IndexInput.readVInt(). > Use VShort to encode positions > ------------------------------ > > Key: LUCENE-2232 > URL: https://issues.apache.org/jira/browse/LUCENE-2232 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Reporter: Paul Elschot > Attachments: LUCENE-2232-nonbackwards.patch, LUCENE-2232-nonbackwards.patch > > > Improve decoding speed for typical case of two bytes for a delta position at the cost of increasing the size of the proximity file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org