hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Purtell (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (HBASE-3477) Filter for deprecated mapred APIs doesn't work when the table has few rows
Date Sat, 19 Jul 2014 01:07:39 GMT

     [ https://issues.apache.org/jira/browse/HBASE-3477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Andrew Purtell resolved HBASE-3477.
-----------------------------------

    Resolution: Cannot Reproduce

Reopen or file new issue if relevant with modern HBase versions

> Filter for deprecated mapred APIs doesn't work when the table has few rows
> --------------------------------------------------------------------------
>
>                 Key: HBASE-3477
>                 URL: https://issues.apache.org/jira/browse/HBASE-3477
>             Project: HBase
>          Issue Type: Bug
>          Components: Filters
>    Affects Versions: 0.90.0
>         Environment: Linux (Debian), master 1, slaves 2
>            Reporter: Yifeng Jiang
>
> It seems that the filters will not be invoke when there are only a few data in the table.
> I added some logs to the org.apache.hadoop.hbase.filte. PrefixFilter, and has a MyInputFormat
extends hbase.mapred.TableInputFormat, the deprecated mapred APIs.
> The log added to PrefixFilter
> {noformat} 
>   public boolean filterRowKey(byte[] buffer, int offset, int length) {
>     log.info("TODO: filterRowKey invoked");
>     if (buffer == null || this.prefix == null) {
>         log.info("TODO: #1 of filter");
>       return true;
>     }
>     if (length < prefix.length) {
>    ...
>   }
> {noformat} 
> This is the code in my InputFormat's configure method.
> {noformat} 
> byte[] prefix = Bytes.toBytes("001");
> Filter filter = new PrefixFilter(prefix);
> setRowFilter(filter);
> {noformat} 
> And the job setup code.
> {noformat} 
> job.setInputFormat(MyInputFormat.class);
> FileInputFormat.addInputPaths(job, "my_table_in_hbase");
> job.set(TableInputFormat.COLUMN_LIST, "data:");
> {noformat} 
> When I put lots of data (> 500,000) in the table, the filter works well, but when
I put only a few data (<100) in the table, it seems that the filter will not be invoked,
 and the log in the filter has no output either.
> This is the log output when lots of data in the table
> {noformat} 
> 2011-01-25 16:43:59,568 INFO org.apache.hadoop.hbase.filter.PrefixFilter: TODO: default
constructor
> 2011-01-25 16:44:01,728 INFO org.apache.hadoop.hbase.filter.PrefixFilter: TODO: filterRowKey
invoked
> 2011-01-25 16:44:01,728 INFO org.apache.hadoop.hbase.filter.PrefixFilter: TODO: #3 of
filter
> 2011-01-25 16:44:01,728 INFO org.apache.hadoop.hbase.filter.PrefixFilter: TODO: filterAllRemaining
invoked
> 2011-01-25 16:44:01,729 INFO org.apache.hadoop.hbase.filter.PrefixFilter: TODO: filterAllRemaining
invoked
> 2011-01-25 16:44:01,729 INFO org.apache.hadoop.hbase.filter.PrefixFilter: TODO: filterAllRemaining
invoked
> {noformat} 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message