hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xian Woo <infinity0...@gmail.com>
Subject Re: Column Indexing - Top N Columns
Date Thu, 28 Jul 2011 17:18:43 GMT
as far as I am concerned, first you create a Get instance for a specified
row ,than use the Htable.get() method to return a result to your client from
the cluster , Then you get can get a key value each time by using
for(Keyvalue kv : result.raw()), But since result.raw() only return the
keyvalues in the ascending form,so u may need some extra operations.
Or maybe you can directly use result.getMap() to get a sorted map of all the
keyvalues and do some operation youself.
You can also use a ColumnPaginationFilter if you know exactly how many
columns in the row which you specify.

And speaking of the number of columns, why do you need so many columns?I
hear that HBase "hopes" the number of rows is larger than the number of
columns. So may I ask if the number of rows is much larger than 6 million?

2011/7/29 Barış Can Daylık <baris.daylik@iletken.com.tr>

> There can be at most 6 million columns, but I don't think it would exceed
> 100K on average.  What would result.raw() produce?
> On 07/28/2011 07:11 PM, Xian Woo wrote:
>> I don't know how many columns there are in your column family, If there
>> are
>> not too many columns , using Result.raw() may be a selection.
>> 2011/7/28 Barış Can Daylık<baris.daylik@iletken.**com.tr<baris.daylik@iletken.com.tr>
>> >
>>  Hi everyone,
>>> I do have a column family where I store counts of items under each
>>> column,
>>> and I need to have top N columns (items) sorted by count descending. I
>>> know
>>> hbase doesn't sort columns by value and do not have an indexing option to
>>> do
>>> so. But as I searched I found out a patch (IHbase) for this indexing job.
>>> However I'm not able to find out a way to get only the top N columns even
>>> by
>>> using IHbase.
>>> Can you suggest an example usage? Or another patch or tool for this job?
>>> Can lucene be used in such a scenario?
>>> Thanks
>>> Baris
>>> p.s. Column values are positive integers.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message