kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Henke <ghe...@cloudera.com>
Subject Re: Empy RowResultIterator with RangePartitions
Date Thu, 11 Jul 2019 18:00:40 GMT
Hi John,

If you can leverage the newly released Kudu 1.10.0 client, the KuduScanner
in the Java client is now iterable. Additionally the KuduScannerIterator
will automatically make scanner keep alive calls to ensure scanners do not
time out while iterating. This means the you can use a Java for each loop
and all the details are handled:

*for (RowResult row : scanner) {*

*   .... }*

If you can't use Kudu 1.10.0, it is expected that `nextRows` could be empty
and you need to keep calling it until `scanner.hasMoreRows()` is empty.
This often looks something like:

while (scanner.hasMoreRows()) {
   for (RowResult result : scanner.nextRows()) {
        ...
    }
}


Please keep us update on your project and progress, it looks very
interesting!

Thanks,
Grant

On Thu, Jul 11, 2019 at 11:00 AM John Mora <jhnmora000@gmail.com> wrote:

> Hi all.
>
> I am John Mora, a GSoC student that is working with the Apache Gora
> Community in order to implement a Kudu DataStore for Gora.
>
> Currently, I am having some issues with KuduScanner, so please could you
> give some ideas of what I am doing wrong.
>
> I am using kudu-client for java [1] and testing my code with
> KuduTestHarness [2].
>
> My code looks like this.
>
> List<ColumnSchema> columns = new ArrayList<>();
> columns.add(new ColumnSchema.ColumnSchemaBuilder("pkurl",
> Type.STRING).key(true).build());
> columns.add(new ColumnSchema.ColumnSchemaBuilder("content",
> Type.BINARY).nullable(true).build());
> columns.add(new ColumnSchema.ColumnSchemaBuilder("parsedContent",
> Type.STRING).nullable(true).build());
>
> List<String> keys = new ArrayList<>();
> keys.add("pkurl");
>
> Schema sch = new Schema(columns);
> CreateTableOptions cto = new CreateTableOptions();
> cto.setRangePartitionColumns(keys);
>
> PartialRow lowerPar1 = sch.newPartialRow();
> PartialRow upperPar1 = sch.newPartialRow();
>
> upperPar1.addString("pkurl", "http://bar.com/");
> cto.addRangePartition(lowerPar1, upperPar1);
>
> PartialRow lowerPar2 = sch.newPartialRow();
> PartialRow upperPar2 = sch.newPartialRow();
>
> lowerPar2.addString("pkurl", "http://bar.com/");
> cto.addRangePartition(lowerPar2, upperPar2);
>
>
> table = client.createTable(kuduMapping.getTableName(), sch, cto);
>
> // Insert some data using table.newInsert();
> // {pkurl:"http://foo.com/1.html", content:[...], parsedContent:[..]}
> // {pkurl:"http://baz.com/1.jsp&q=barbaz", content:[...],
> parsedContent:[..]}
> // {pkurl:"http://baz.com/1.jsp&q=barbaz&p=foo", content:[...],
> parsedContent:[..]}
>
> //Scanner
> KuduScanner.KuduScannerBuilder scannerBuilder =
> client.newScannerBuilder(table);
> List<String> dbFields = new ArrayList<>();
> dbFields.add("pkurl");
> dbFields.add("content");
> dbFields.add("parsedContent");
> scannerBuilder.setProjectedColumnNames(dbFields);
> KuduScanner build = scannerBuilder.build();
> RowResultIterator resultIt = build.nextRows();
> //Actual: RowResultIterator is Empty
> //Expected: RowResultIterator has 3 entries.
>
> I tested the same code with cto.addHashPartitions(keys, 2); instead of
> addRangePartition.
> And it works fine.
>
> Why do I get an empty result when using addRangePartition? .
>
>
>
> Cheers,
> John
>
> [1] https://kudu.apache.org/docs/developing.html#_maven_artifacts
> [2]
> https://kudu.apache.org/docs/developing.html#_jvm_based_integration_testing
>


-- 
Grant Henke
Software Engineer | Cloudera
grant@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke

Mime
View raw message