hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ramkrishna vasudevan <ramkrishna.s.vasude...@gmail.com>
Subject Re: Scan with SingleColumnValueFilterfilter giving wrong output
Date Fri, 04 Apr 2014 09:49:52 GMT
Hi Ashish

I think the behaviour is fine.  There is a property called filterIfMissing
in Single Columnvalue filter.  If you don't need a row which does not have
the given column and value in your query, then you should set that to true.
See SingleColumnValueFilter.setFilterIfMissing.
In your first result row6 should not be the result (strictly if you want
only value1 with col1). You can see that it has col3 in the result.

In your second result too row6 falls under the same category.
If a row has more than one cell and if we have found the matching cell, the
following cells of that row will also be included in the result.  I hope
the same behavior is there in 0.94.11 also. (should be).

Regards
Ram



On Fri, Apr 4, 2014 at 2:37 PM, ashish singhi <ashish.singhi@huawei.com>wrote:

> Hi all.
>
> I am using,
> HBase Version - 0.94.11
> Hadoop Version - 2.1.0
>
> I am trying to get columns having the specified value. So for that I am
> using org.apache.hadoop.hbase.filter.SingleColumnValueFilter class.
> But when I try to scan table to find columns with column value 'value2', I
> think I am not getting proper output.
>
> Can someone please tell me where I am wrong ?
>
> I have a simple hbase table with records.
> hbase(main):014:0> scan 'testTable'
> ROW                                                  COLUMN+CELL
> row1
>  column=colFammily1:col1, timestamp=1396586048561, value=value1
> row2
>  column=colFammily1:col1, timestamp=1396586054526, value=value2
> row2
>  column=colFammily1:col2, timestamp=1396585985022, value=testValue
> row3
>  column=colFammily1:col1, timestamp=1396586060989, value=value3
> row4
>  column=colFammily1:col1, timestamp=1396586066037, value=value4
> row5
>  column=colFammily1:col1, timestamp=1396586071842, value=value5
> row6
>  column=colFammily1:col3, timestamp=1396590405939, value=value1
> 6 row(s) in 0.0320 seconds
>
> First: When I executed scan command with column value 'value1' I got the
> proper output.
> hbase(main):023:0> scan 'testTable', {FILTER =>
> org.apache.hadoop.hbase.filter.SingleColumnValueFilter.new(org.apache.hadoop.hbase.util.Bytes.toBytes('colFammily1'),org.apache.hadoop.hbase.util.Bytes.toBytes('col1'),
> org.apache.hadoop.hbase.filter.CompareFilter::CompareOp.valueOf('EQUAL'),org.apache.hadoop.hbase.util.Bytes.toBytes('value1'))}
> ROW                                                  COLUMN+CELL
> row1
>  column=colFammily1:col1, timestamp=1396586048561, value=value1
> row6
>  column=colFammily1:col3, timestamp=1396590405939, value=value1
> 2 row(s) in 0.0160 seconds
>
> Second: When I tried same command with column value 'value2' the output
> seems to be incorrect.
> hbase(main):025:0> scan 'testTable', {FILTER =>
> org.apache.hadoop.hbase.filter.SingleColumnValueFilter.new(org.apache.hadoop.hbase.util.Bytes.toBytes('colFammily1'),org.apache.hadoop.hbase.util.Bytes.toBytes('col1'),
> org.apache.hadoop.hbase.filter.CompareFilter::CompareOp.valueOf('EQUAL'),org.apache.hadoop.hbase.util.Bytes.toBytes('value2'))}
> ROW                                                  COLUMN+CELL
> row2
>  column=colFammily1:col1, timestamp=1396586054526, value=value2
> row2
>  column=colFammily1:col2, timestamp=1396585985022, value=testValue
> row6
>  column=colFammily1:col3, timestamp=1396590405939, value=value1
> 2 row(s) in 0.0100 seconds
>
> I am not able to understand why I am getting row2 with col2 and row6 in
> the output where in their column values are not 'value2'.
>
> Regards,
> Ashish
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message