hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Barış Can Daylık" <baris.day...@iletken.com.tr>
Subject Re: Column Indexing - Top N Columns
Date Fri, 29 Jul 2011 07:38:26 GMT
Number of rows will be 6 million. So in the worst case the table will be 
square, but on average 100K columns won't be exceeded.

If I'm not mistaken columns are sorted by column names and not the 
values, does result.raw return columns sorted by their values? If it 
does so, does it sort them when I call result.raw, or are they pre-sorted?

On 07/28/2011 08:18 PM, Xian Woo wrote:
> as far as I am concerned, first you create a Get instance for a specified
> row ,than use the Htable.get() method to return a result to your client from
> the cluster , Then you get can get a key value each time by using
> for(Keyvalue kv : result.raw()), But since result.raw() only return the
> keyvalues in the ascending form,so u may need some extra operations.
> Or maybe you can directly use result.getMap() to get a sorted map of all the
> keyvalues and do some operation youself.
> You can also use a ColumnPaginationFilter if you know exactly how many
> columns in the row which you specify.
>
> And speaking of the number of columns, why do you need so many columns?I
> hear that HBase "hopes" the number of rows is larger than the number of
> columns. So may I ask if the number of rows is much larger than 6 million?
>
> 2011/7/29 Barış Can Daylık<baris.daylik@iletken.com.tr>
>
>> There can be at most 6 million columns, but I don't think it would exceed
>> 100K on average.  What would result.raw() produce?
>>
>>
>> On 07/28/2011 07:11 PM, Xian Woo wrote:
>>
>>> I don't know how many columns there are in your column family, If there
>>> are
>>> not too many columns , using Result.raw() may be a selection.
>>>
>>> 2011/7/28 Barış Can Daylık<baris.daylik@iletken.**com.tr<baris.daylik@iletken.com.tr>
>>>   Hi everyone,
>>>> I do have a column family where I store counts of items under each
>>>> column,
>>>> and I need to have top N columns (items) sorted by count descending. I
>>>> know
>>>> hbase doesn't sort columns by value and do not have an indexing option to
>>>> do
>>>> so. But as I searched I found out a patch (IHbase) for this indexing job.
>>>> However I'm not able to find out a way to get only the top N columns even
>>>> by
>>>> using IHbase.
>>>>
>>>> Can you suggest an example usage? Or another patch or tool for this job?
>>>> Can lucene be used in such a scenario?
>>>>
>>>> Thanks
>>>> Baris
>>>>
>>>> p.s. Column values are positive integers.
>>>>
>>>>


Mime
View raw message