hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kelvin Rawls <kel...@iswcorp.com>
Subject RE: Scan startRow seems to be broke in HBase 0.20.2
Date Tue, 31 Aug 2010 16:39:40 GMT
thanks for reply, example of output and keys below, if I do 2 at a time it seems to work and
with 4 at a time fails.  Notice repeated IDs

>>> Perform getIngestedIds operation on Content MBean for: 4 IDs <<<

>>> Doc Id: 105724125d074b891d385854d04f39:c4ef671418deeced1bc7ee21a5a0c7
>>> Doc Id: 0e3b7d681ca79761ca69f43818d9c9:619e9b53ff76d791be6db5ab918ecae
>>> Doc Id: 10b89e4889a147ee32834f22d2afed23:a7cb32c8f81473bb8ab47c40bbb399
>>> Doc Id: 0579baefde6664f2bb3bc93aee97a:a9b534d23fdd93f0d2336eada3e44f


>>> Now call getIngestedIds operation on Content MBean for: 4 IDs with lastDoc ID
= 0579baefde6664f2bb3bc93aee97a:a9b534d23fdd93f0d2336eada3e44f<<<

>>> Doc Id: 105724125d074b891d385854d04f39:c4ef671418deeced1bc7ee21a5a0c7
>>> Doc Id: 0e3b7d681ca79761ca69f43818d9c9:619e9b53ff76d791be6db5ab918ecae
>>> Doc Id: 10cfbe25df8835b6f26fc696db84b32:658676d2efd9a4ee4a9e29932ec8916
>>> Doc Id: 10b89e4889a147ee32834f22d2afed23:a7cb32c8f81473bb8ab47c40bbb399

btw, the keys are <hash of normalized content>:<hash of normalized url>

Test cluster is multi-use and not easily upgraded just yet.  I will work on setting up another
test cluster with latest.  Also, not in code below I am also using regular expression filter
to not get some keys returned.  Can setStartRow and other filters conflict? 

Thanks again,

Kelvin 
________________________________________
From: jdcryans@gmail.com [jdcryans@gmail.com] On Behalf Of Jean-Daniel Cryans [jdcryans@apache.org]
Sent: Tuesday, August 31, 2010 12:22 PM
To: user@hbase.apache.org
Subject: Re: Scan startRow seems to be broke in HBase 0.20.2

It's more about the question missing information, like an example
output of your query and a sample of your dataset. Also you are using
0.20.2, which 4 minor revisions old.

So I tried a simple test in the shell using HBase 0.20.2 just as a sanity check:

hbase(main):005:0> scan 't'
ROW                          COLUMN+CELL
 1                           column=f:, timestamp=1283271502185,
value=val1
 2                           column=f:, timestamp=1283271507825,
value=val2
 3                           column=f:, timestamp=1283271512665,
value=val3
3 row(s) in 0.0300 seconds
hbase(main):006:0> scan 't', {STARTROW => '2'}
ROW                          COLUMN+CELL
 2                           column=f:, timestamp=1283271507825,
value=val2
 3                           column=f:, timestamp=1283271512665,
value=val3

As you can see it works, under the hood it calls exactly the same
method. Are your keys sorted the way you think they are?

J-D

On Tue, Aug 31, 2010 at 9:06 AM, Kelvin Rawls <kelvin@iswcorp.com> wrote:
> It seems my question is not clear:
>
> does this call:
>
> scan.setStartRow(Bytes.toBytes(lastDoc))
>
> .. have any effect on rows returned for anyone else?
>
> Thanks,
>
> Kelvin
> ________________________________________
> From: Kelvin Rawls [kelvin@iswcorp.com]
> Sent: Monday, August 30, 2010 11:25 AM
> To: user@hbase.apache.org
> Subject: Scan startRow seems to be broke in HBase 0.20.2
>
> No matter what I tell it, this seems to return Row IDs from the beginning of the table.
>
> code
>
>    public List<String> getKeys(String lastDoc, int N) {
>       List<String> results = new ArrayList<String>();
>        try {
>            Scan scan = new Scan();
>            scan.setStartRow(Bytes.toBytes(lastDoc));
>            StringBuilder regExp = new StringBuilder();
>            regExp.append("MYROWFLAGTRUE");
>            SingleColumnValueFilter scvf = new SingleColumnValueFilter("MYROW".getBytes(),
>                            "FLAG".getBytes(), CompareFilter.CompareOp.EQUAL,
>                            new RegexStringComparator(regExp.toString()));
>                    scvf.setFilterIfMissing(true);
>                    scan.setFilter(scvf);
>            ResultScanner scanner = table.getScanner(scan);
>            for (Result rr : scanner.next(N)) {
>                String next_str = Bytes.toString(rr.getRow());
>                results.add(next_str);
>            }
>            scanner.close();
>        } catch (IOException ex) {
>            m_log.error("Error getting keys", ex);
>        }
>        m_log.debug("Returning " + results.size() + " ids");
>        return results;
>   }
>
> Thanks for any help.
>
> Kelvin L. Rawls
>
> 410-290-6240, office
> 301-221-1308, cell
> 703 741-3120, fax
> www.iswcorp.com
>

Mime
View raw message