hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Irfan Mohammed <irfan...@gmail.com>
Subject Scan returns rows beyond the endRow when the column is specified
Date Thu, 18 Jun 2009 10:16:15 GMT
Hi,
We ran into an issue where the scan resulted in rows beyond the endRow. Are we doing something
incorrectly here? The test case is given below. When the scan.addColumn(...) is specified,
the rows has { "row333" } but having the scan.addColumn(...) gives rows { "row555" }.

dumpTable results :
09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row : [row111]
09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row : [row111], family
: [publisher_id]
09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row : [row111], [publisher_id:Pub111]
=> [10]
09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row : [row222]
09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row : [row222], family
: [publisher_id]
09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row : [row222], [publisher_id:Pub111]
=> [15]
09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row : [row333]
09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row : [row333], family
: [publisher_id]
09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row : [row333], [publisher_id:Pub222]
=> [20]
09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row : [row444]
09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row : [row444], family
: [publisher_id]
09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row : [row444], [publisher_id:Pub222]
=> [30]
09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row : [row555]
09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row : [row555], family
: [publisher_id]
09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row : [row555], [publisher_id:Pub111]
=> [40]

We are using 0.20.0 from mainline on ubuntu 9.04.

Thanks,
Irfan

    /**
     * Test case to confirm a bug in HBase Scan method.
     * 
     * Scenario -
     * a) We have a HTable as follows
     *  
     *          |    publisher_id:Pub111         publisher_id:Pub222
     * -------------------------------------------------------------
     * row111   |        x                  |
     * row222   |        x                  |
     * row333   |                           |         x
     * row444   |                           |         x
     * row555   |        x                  |
     * --------------------------------------------------------------
     * Where 'x' denotes some data
     * 
     * b) We set up a Scan from "row333" to "row444", and specify column publisher_id:Pub111
     * c) We expect to get 0 Result objects, because  row333 and row444 do not have any data
for Pub111
     * d) BUG - Instead we get the Result row for row555 .. which is totally unexpected (its
outside the range we specified in the Scan)  
     * 
     * @throws Exception just propagates the Exception
     */
    @Test
    public void testBugInHBaseScan() throws Exception
    {
        /*
         * Create the table
         */
        HTable table = createTable("test_get_range_value", new FilterableDimension[]{Dimension.PUBLISHER},
null);
        
        /*
         * Add rows
         */
        Put row1 = new Put(Bytes.toBytes("row111"));
        Put row2 = new Put(Bytes.toBytes("row222"));
        Put row3 = new Put(Bytes.toBytes("row333"));
        Put row4 = new Put(Bytes.toBytes("row444"));        
        Put row5 = new Put(Bytes.toBytes("row555"));
        
        row1.add(Bytes.toBytes(Dimension.PUBLISHER.getFamilyName()), Bytes.toBytes("Pub111"),
Bytes.toBytes(10L));
        row2.add(Bytes.toBytes(Dimension.PUBLISHER.getFamilyName()), Bytes.toBytes("Pub111"),
Bytes.toBytes(15L));        
        row3.add(Bytes.toBytes(Dimension.PUBLISHER.getFamilyName()), Bytes.toBytes("Pub222"),
Bytes.toBytes(20L));
        row4.add(Bytes.toBytes(Dimension.PUBLISHER.getFamilyName()), Bytes.toBytes("Pub222"),
Bytes.toBytes(30L));
        row5.add(Bytes.toBytes(Dimension.PUBLISHER.getFamilyName()), Bytes.toBytes("Pub111"),
Bytes.toBytes(40L));
        
        List<Put> rows = new ArrayList<Put>();
        rows.add(row1);
        rows.add(row2);
        rows.add(row3);
        rows.add(row4);        
        rows.add(row5);
        
        table.put(rows);
        
        dumpTable(table);
        
        /*
         * Per the above setup, row333 and row444 don't have any data for the column publisher_id:Pub111,
         * BUT they have data for the column publisher_id:Pub222
         * 
         * We now setup a Scan between row333 and row444 on the column publisher_id:Pub111
         * 
         * Expected behavior - No Result objects should be returned.
         */
        
        Scan scan = new Scan(Bytes.toBytes("row333"), Bytes.toBytes("row444"));
        scan.setMaxVersions(100000);
        scan.addColumn(Bytes.toBytes(Dimension.PUBLISHER.getFamilyName()), Bytes.toBytes("Pub111"));
        
        boolean success = true;
        ResultScanner scanner = table.getScanner(scan);
        Result result = scanner.next();
        
        StringBuffer buffer = new StringBuffer();
        while (result != null) {
            success = false;
            buffer.append(result);
            buffer.append("\n");
            result = scanner.next();
        }
        
        assertTrue("Did not expect the scanner to return any Result .. but got a result object
for these rows - " + buffer.toString(), success);
    }

Mime
View raw message