hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pranav Khaitan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-1517) Implement inexpensive seek operations in HFile
Date Wed, 07 Jul 2010 22:25:52 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886120#action_12886120
] 

Pranav Khaitan commented on HBASE-1517:
---------------------------------------

Yes, I think you are right that the API looks cleaner. Adding reseek() to
KeyValueScanner would require us to add it to seven classes which implement
KeyValueScanner. I am fine with it if you guys think it is the better way.

The file StoreScanner already contains a reseek method. I guess that this
method does a different task than the reseek we are talking about now so I
guess we will have to change its name.

I had another question related to this. Currently, StoreScanner contains
both a seek method and a reseek method (though I couldn't find its seek()
being used anywhere, we have to implement it because it is inherited). Also,
the implementation of StoreScanner.seek() seems considerably different from
StoreScanner.reseek() and the seeking done in its constructors. In reseek()
and in constructors, we create List<KeyValueScanner>, then call seek for
each of these scanners and then add them to a heap. However, in seek(), we
create List<KeyValueScanner>, add them all to a heap and then call seek on
the heap. Is this difference on purpose?









> Implement inexpensive seek operations in HFile
> ----------------------------------------------
>
>                 Key: HBASE-1517
>                 URL: https://issues.apache.org/jira/browse/HBASE-1517
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.20.0
>            Reporter: Jonathan Gray
>            Assignee: ryan rawson
>             Fix For: 0.21.0
>
>
> When we early-out of a row because of columns, versions, filters, etc... we seek to the
end of that row one key at a time.  We should do the seek at the HFile level in cases where
we would end up skipping blocks in the process.  This will be very common in cases with relatively
large rows and regex row filters.
> If calls that end up doing nothing are constant time, we could also call this to seek
to the next column (or even a specific column in ExplicitTracker case).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message