hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Wolf <opus...@gmail.com>
Subject Re: Scan.addFamiliy reduces results
Date Thu, 15 Mar 2012 19:05:26 GMT
Huh!  That's what I was afraid you'd say.  I'm still confused :-(

If "it will give all rows that contain _any_ of these families", then 
why does adding a family give me *less* rows?

Leaving my row start/stop and filtering code constant, and just 
un-commenting an addFamily() dramatically reduces the number of results 
returned from a scan.

P



On 3/15/12 2:42 PM, Himanshu Vashishtha wrote:
> " Let's also say there are 1000 rows with A,B,C and 500 rows with only B and C.
>
> If I add families A, B and C and scan with no filter will I get 1500,
> 1000 or 500 results?"
>
> In this case, you will get 1000 rows. In case you add only B, you will
> get 500 rows.
>
> It's not like if you add families A, B and C, it will give you _only_
> those rows that have _all_ three families; rather it will give all
> rows that contain _any_ of these families.
>
> Hope this helps.
>
> Experts are welcome to chime in if I am missing something :)
>
> Thanks,
> Himanshu
>
>
> On Thu, Mar 15, 2012 at 11:48 AM, Peter Wolf<opus111@gmail.com>  wrote:
>> Hi Lars, still confused...
>>
>> My table *should* have values for families A, B and C.  Let's say I have a
>> bug, and some rows only have values for B and C.  Let's also say there are
>> 1000 rows with A,B,C and 500 rows with only B and C.
>>
>> If I add families A, B and C and scan with no filter will I get 1500, 1000
>> or 500 results?
>>
>> Many thanks
>> P
>>
>>
>>
>>
>> On 3/15/12 1:17 PM, lars hofhansl wrote:
>>> Hi haijia,
>>>
>>> In that case HBase will still return the data for columns in family B and
>>> C.But if you only added family A then HBase would only return "rows" for
>>> which family A has any columns.
>>>
>>> -- Lars
>>> ________________________________
>>>
>>> From: Haijia Zhou<leonster@gmail.com>
>>> To: user@hbase.apache.org; lars hofhansl<lhofhansl@yahoo.com>
>>> Sent: Thursday, March 15, 2012 10:12 AM
>>> Subject: Re: Scan.addFamiliy reduces results
>>>
>>>
>>> I have the same confusion. Say if I added three column families A, B anc C
>>> to the scan, now if a row has data for column family B and C but no data for
>>> A, then it won't be returned  in the next() method?
>>> What if the requirement is to get row data regardless of whether there's
>>> data for a specific column family or not?
>>>
>>>
>>> On Thu, Mar 15, 2012 at 1:04 PM, lars hofhansl<lhofhansl@yahoo.com>
>>>   wrote:
>>>
>>> Hi Peter,
>>>> for HBase you have keep in mind that it is a sparse columnar (or
>>>> KeyValue) store: (rowkey, columnfamily, column, TS) ->    value
>>>>
>>>> A scan only returns those KeyValues that match the scan. So when you set
>>>> families on your scan you'll only get those rows for which the scan found
>>>> any columns.
>>>>
>>>> Makes sense?
>>>>
>>>> -- Lars
>>>>
>>>>
>>>>
>>>> ________________________________
>>>>   From: Peter Wolf<opus111@gmail.com>
>>>> To: user@hbase.apache.org
>>>> Sent: Thursday, March 15, 2012 9:52 AM
>>>> Subject: Re: Scan.addFamiliy reduces results
>>>>
>>>>
>>>> Thanks Doug,
>>>>
>>>> I had read that, and I just read it again.  But I am missing something...
>>>>
>>>> Why does adding a family reduce the number of results?  Is there an
>>>> implied filter of some form?  Does addFamily add some constraint on
>>>> which rows are returned?
>>>>
>>>> Note that all my rows *ought* to have values in all the families.
>>>>
>>>> Thanks
>>>> Peter
>>>>
>>>> On 3/15/12 12:39 PM, Doug Meil wrote:
>>>>> re:  "However, I am getting different number of results, depending on
>>>>> which families are added"
>>>>>
>>>>> Yes.
>>>>>
>>>>> I'd suggest you read this in the RefGuide.
>>>>>
>>>>> http://hbase.apache.org/book.html#datamodel
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 3/15/12 12:08 PM, "Peter Wolf"<opus111@gmail.com>     wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I am doing a scan on a table with multiple families.  My code looks
>>>>>> like
>>>>>> this...
>>>>>>
>>>>>>            Scan scan = new Scan(calculateStartRowKey(a),
>>>>>> calculateEndRowKey(b));
>>>>>>
>>>>>>            scan.setCaching(10000);
>>>>>>            Filter filter = new SingleColumnValueFilter(xFamily, xColumn,
>>>>>> CompareFilter.CompareOp.EQUAL, Bytes.toBytes(x));
>>>>>>            scan.setFilter(filter);
>>>>>>            scan
>>>>>>                    .addFamily(xFamily)
>>>>>>                    .addFamily(yFamily)
>>>>>>                    .addFamily(zFamily);
>>>>>>
>>>>>>            ResultScanner scanner = hTable.getScanner(scan);
>>>>>>
>>>>>>            Iterator<Result>     it = scanner.iterator();
>>>>>>            int resultCount = 0;
>>>>>>            while (it.hasNext()) {
>>>>>>                  Result result = it.next();
>>>>>>
>>>>>>                  resultCount++;
>>>>>>            }
>>>>>>
>>>>>> However, I am getting different number of results, depending on which
>>>>>> families are added.  For example these give different result counts
>>>>>>
>>>>>>            scan
>>>>>>                    //.addFamily(xFamily)
>>>>>>                    .addFamily(yFamily)
>>>>>>                    .addFamily(zFamily);
>>>>>> and
>>>>>>            scan
>>>>>>                    .addFamily(xFamily)
>>>>>>                    .addFamily(yFamily)
>>>>>>                    .addFamily(zFamily);
>>>>>>
>>>>>>
>>>>>> There is no error message, and I don't see anything in the Scan
>>>>>> documentation.  Does anyone know what is going on?
>>>>>>
>>>>>> Thanks
>>>>>> Peter
>>>>>>
>>>>>>
>>>>>>


Mime
View raw message