lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Harris (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2232) Use VShort to encode positions
Date Mon, 01 Feb 2010 18:32:18 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12828210#action_12828210
] 

Chris Harris commented on LUCENE-2232:
--------------------------------------

I have a little bit of sampling profiling data from YourKit that may be relevant. (Paul encouraged
me to post anyway.) Note that the queries submitted were not limited to those requiring PRX
data, although some of them (30%? 40%?) did. This data is _without_ applying this LUCENE-2232
patch. YourKit was set to time java.io.RandomAccessFile.readBytes and .read with wall clock
time.

1. I replayed about 1000 queries taken from our user query logs on a test system that uses
rotating drives, without first submitting any battery of warmup queries.

{code}
 SegmentTermPositions.readDeltaPosition()
     IndexInput.readVInt() <----------
{code}

I looked at the time spent in the marked call to IndexInput.readVInt(). 93% of the time in
this readVint() was spent in I/O, leaving a maximum of 7% that could theoretically be wasted
on the CPU decoding VInts.

2. I profiled one of our live Solr servers that uses SSD drives, after the system had warmed
up a bit. Here is the resulting profiling data, with times relative to SegmentTermPositions.readDeltaPosition():

{code}
SegmentTermPositions.readDeltaPosition() - 100%
  IndexInput.readVInt - 100%
    BufferedIndexInput.readByte - 69%
      BufferedIndexInput.refill - 69%
        SimpleFSDirectory$SimpleFSIndexInput.readInternal - 69%
          java.io.RandomAccessFile.read - 55%
          java.io.RandomAccessFile.seek - 14%
{code}

Here we have a healthier 31% of the time that could potentially be sped up by this patch.
It partly depends on how much the patch would increase I/O, though. (I guess the hope is that
it wouldn't increase I/O by too crazy amount if your documents are above a certain size.)

> Use VShort to encode positions
> ------------------------------
>
>                 Key: LUCENE-2232
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2232
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Paul Elschot
>         Attachments: LUCENE-2232-nonbackwards.patch, LUCENE-2232-nonbackwards.patch
>
>
> Improve decoding speed for typical case of two bytes for a delta position at the cost
of increasing the size of the proximity file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message