hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dave Latham (JIRA)" <j...@apache.org>
Subject [jira] Resolved: (HBASE-1652) Scanners for sparse column not stopped by StopRowFilter
Date Tue, 22 Jun 2010 22:17:54 GMT

     [ https://issues.apache.org/jira/browse/HBASE-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Dave Latham resolved HBASE-1652.
--------------------------------

    Resolution: Won't Fix

Scan's have stop rows as of HBase 0.20, so the StopRowFilter is no longer needed.

> Scanners for sparse column not stopped by StopRowFilter
> -------------------------------------------------------
>
>                 Key: HBASE-1652
>                 URL: https://issues.apache.org/jira/browse/HBASE-1652
>             Project: HBase
>          Issue Type: Bug
>          Components: filters, regionserver
>    Affects Versions: 0.19.3
>            Reporter: Dave Latham
>
> Scanning a sparse column over a narrow range of rows can take far longer than expected
because the check for the end of the range is not performed on new rows unless there is a
column match, so it may end up scanning an entire region or table.
> Background:
> I have a table with 1 billion+ rows, and one cell in each row, generally small (10-1000
bytes).  The columns are all in a single family and fairly sparse.  For one query, I run scans
on it to scan usually a narrow range of the table for the first 30 cells ina certain column.
 I know that all the rows that contain that column lie within a certain range.  I use HTable.getScanner(byte[][]
columns, byte[] startRow, RowFilterInterface filter) passing it the particular column I'm
looking for, a startRow, and a filter set containing a StopRowFilter wrapped in a WhileMatchRowFilter
to enforce the end of the range.  Sometimes the query is very fast (< 1 sec), but if the
table doesn't contain 30 rows with that column, it can be very slow, a minute or two.  I expected
that since the range was small, for example, just 120 rows, the query wouldn't take long to
scan the rows.
> After some pondering and perusing of the source code, I think I understand what is going
on.  It looks like the Scanner is scanning the rest of the table to find rows containing the
column without allowing the StopRowFilter to stop the scan at the end of the range.  I think
I can work around this by not specifying the column I want in the getScanner() method and
instead putting an additional filter in the filter set to filter out other columns.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message