Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Date: Tue, 4 Oct 2011 23:57:35 +0000 (UTC)
From: "jiraposter@reviews.apache.org (Commented) (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Message-ID: 
 <830411974.9904.1317772655355.JavaMail.tomcat@hel.zones.apache.org>
In-Reply-To: 
 <935088894.4873.1316747487232.JavaMail.tomcat@hel.zones.apache.org>
Subject: [jira] [Commented] (HBASE-4465) Lazy-seek optimization for
 StoreFile scanners
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


    [ https://issues.apache.org/jira/browse/HBASE-4465?page=3Dcom.atlassian=
.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D1312=
0594#comment-13120594 ]=20

jiraposter@reviews.apache.org commented on HBASE-4465:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2180/#review2332
-----------------------------------------------------------


src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java
<https://reviews.apache.org/r/2180/#comment5392>

    Should be "lazily-sought"


src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java
<https://reviews.apache.org/r/2180/#comment5393>

    Should be 'real-sought' and 'lazily-sought'


src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java
<https://reviews.apache.org/r/2180/#comment5394>

    Should be 'is sought'


src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java
<https://reviews.apache.org/r/2180/#comment5395>

    Should realSeekDone be set before returning ?


- Ted


On 2011-10-04 22:10:40, Mikhail Bautin wrote:
bq. =20
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2180/
bq.  -----------------------------------------------------------
bq. =20
bq.  (Updated 2011-10-04 22:10:40)
bq. =20
bq. =20
bq.  Review request for hbase.
bq. =20
bq. =20
bq.  Summary
bq.  -------
bq. =20
bq.  Previously, if we had several StoreFiles for a column family in a regi=
on, we would seek in each of them and only then merge the results, even tho=
ugh the row/column we are looking for might only be in the most recent (and=
 the smallest) file. Now we prioritize our reads from those files so that w=
e check the most recent file first. This is done by doing a "lazy seek" whi=
ch pretends that the next value in the StoreFile is (seekRow, seekColumn, l=
astTimestampInStoreFile), which is earlier in the KV order than anything th=
at might actually occur in the file. So if we don't find the result in earl=
ier files, that fake KV will bubble up to the top of the KV heap and a real=
 seek will be done. This is expected to significantly reduce the amount of =
disk IO (as of 09/22/2011 we are doing dark launch testing and measurement)=
.
bq. =20
bq.  This is joint work with Liyin Tang =E2=80=93 huge thanks to him for ma=
ny helpful discussions on this and the idea of putting fake KVs with the hi=
ghest timestamp of the StoreFile in the scanner priority queue.
bq. =20
bq. =20
bq.  This addresses bug HBASE-4465.
bq.      https://issues.apache.org/jira/browse/HBASE-4465
bq. =20
bq. =20
bq.  Diffs
bq.  -----
bq. =20
bq.    src/main/java/org/apache/hadoop/hbase/KeyValue.java aa34006=20
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/AbstractKeyValueS=
canner.java 94ddce7=20
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/ColumnCount.java =
1be0280=20
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java=
 b8d33e8=20
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.j=
ava fbcd276=20
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 035=
f765=20
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/NonLazyKeyValueSc=
anner.java PRE-CREATION=20
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.=
java dad278a=20
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java ab=
b5931=20
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.=
java 31bfea7=20
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java=
 64a6e3e=20
bq.    src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.j=
ava 8ad5aab=20
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStore.java=
 9d2b2a7=20
bq. =20
bq.  Diff: https://reviews.apache.org/r/2180/diff
bq. =20
bq. =20
bq.  Testing
bq.  -------
bq. =20
bq.  Running unit tests -- please do not commit yet.
bq. =20
bq. =20
bq.  Thanks,
bq. =20
bq.  Mikhail
bq. =20
bq.


               =20
> Lazy-seek optimization for StoreFile scanners
> ---------------------------------------------
>
>                 Key: HBASE-4465
>                 URL: https://issues.apache.org/jira/browse/HBASE-4465
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Mikhail Bautin
>            Assignee: Mikhail Bautin
>              Labels: optimization, seek
>             Fix For: 0.89.20100924, 0.94.0
>
>
> Previously, if we had several StoreFiles for a column family in a region,=
 we would seek in each of them and only then merge the results, even though=
 the row/column we are looking for might only be in the most recent (and th=
e smallest) file. Now we prioritize our reads from those files so that we c=
heck the most recent file first. This is done by doing a "lazy seek" which =
pretends that the next value in the StoreFile is (seekRow, seekColumn, last=
TimestampInStoreFile), which is earlier in the KV order than anything that =
might actually occur in the file. So if we don't find the result in earlier=
 files, that fake KV will bubble up to the top of the KV heap and a real se=
ek will be done. This is expected to significantly reduce the amount of dis=
k IO (as of 09/22/2011 we are doing dark launch testing and measurement).
> This is joint work with Liyin Tang -- huge thanks to him for many helpful=
 discussions on this and the idea of putting fake KVs with the highest time=
stamp of the StoreFile in the scanner priority queue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrato=
rs: https://issues.apache.org/jira/secure/ContactAdministrators!default.jsp=
a
For more information on JIRA, see: http://www.atlassian.com/software/jira