hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Rawson <ryano...@gmail.com>
Subject Re: Waiting forever on scanner iterator
Date Tue, 20 Oct 2009 21:03:14 GMT
If you are asking for a column that is very sparse and doesnt exist,
it will cause HBase to read through the entire region to find 100
matching rows. This could take a while, you said 'forever', but could
you quantify that?

On Tue, Oct 20, 2009 at 1:58 PM, Jean-Daniel Cryans <jdcryans@apache.org> wrote:
> Scanner pre-fetching is always faster, so something must be wrong with
> your region server. Check the logs, top, etc
>
> WRT to row size, it's pretty much a matter of how many bytes you have
> in each column and sum them up (plus some overhead with the keys).
>
> You want filters, check the filter package in the javadoc.
>
> J-D
>
> On Tue, Oct 20, 2009 at 1:52 PM, Ananth T. Sarathy
> <ananth.t.sarathy@gmail.com> wrote:
>> Ok, but how come
>> when I run a similiar call (with less returned rows 1000 vs 25k in the
>> previous one) it runs through the iterator very quickly?  (See Below)
>>
>> Also, how do I determine the row size? It's just text data, and really not
>> much.
>>
>> Finally, is there a way to query for rows that do not have a column? (Ie all
>> rows without Files:path1)
>>
>>        HBaseTableDataManagerImpl htdmni = new HBaseTableDataManagerImpl(
>>                "GS_Applications");
>>
>>        String[] columns = { "Files:path1" };
>>        log.info("Getting all Rows with Files");
>>        Scanner s = htdmni.getScannerForAllRows(columns);
>>        log.info("Got all Rows with Files");
>>
>>        Iterator<RowResult> iter = s.iterator();
>>        out
>>
>> .write("Application_Full_Name,Version,Application_installer_name,Operating
>> System, Application_platform
>> ,Application_sub_category,md5Hash,Sha1Hash,Sha256Hash,filepath,fileName,modified,size,operation\n");
>>        out.write("<BR>");
>>        while (iter.hasNext())
>>        {
>>
>> Ananth T Sarathy
>>
>>
>> On Tue, Oct 20, 2009 at 4:44 PM, Jean-Daniel Cryans <jdcryans@apache.org>wrote:
>>
>>> If you have a very slow data source (S3), then it fetches 100 row
>>> before coming back to your client with all of them and that can take a
>>> lot of time. Also make sure that 100 of your rows can fit in a region
>>> server's memory. How big is each row?
>>>
>>> J-D
>>>
>>> On Tue, Oct 20, 2009 at 1:32 PM, Ananth T. Sarathy
>>> <ananth.t.sarathy@gmail.com> wrote:
>>> > I am running this code where
>>> >
>>> > getScannerForAllRows(columns) just does return table.getScanner(columns);
>>> >
>>> > and the table   has setScannerCaching(100);
>>> >
>>> > But it spins forever after getting the iterator. Why would that be? How
>>> can
>>> > I speed it up?
>>> >
>>> >        HBaseTableDataManagerImpl htdmni = new HBaseTableDataManagerImpl(
>>> >                "GS_Applications");
>>> >
>>> >        String[] columns = { "Files:Name" };
>>> >        log.info("Getting all Rows with Files");
>>> >        Scanner s = htdmni.getScannerForAllRows(columns);
>>> >        log.info("Got all Rows with Files");
>>> >        log.info("Getting Iterator");
>>> >
>>> >        Iterator<RowResult> iter = s.iterator();
>>> >        log.info("Got Iterator");
>>> >
>>> >        while (iter.hasNext())
>>> >        {
>>> >            log.info("Getting next Row");
>>> >            RowResult rr = iter.next();
>>> >
>>> >
>>> > Ananth T Sarathy
>>> >
>>>
>>
>

Mime
View raw message