kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Henke <ghe...@cloudera.com>
Subject Re: Empy RowResultIterator with RangePartitions
Date Thu, 11 Jul 2019 18:19:23 GMT
I created a jira to improve the Javadoc for the `nextRows` API here:
https://issues.apache.org/jira/browse/KUDU-2891

If you are interested in contributing it would be a super simple
contribution.

On Thu, Jul 11, 2019 at 1:00 PM Grant Henke <ghenke@cloudera.com> wrote:

> Hi John,
>
> If you can leverage the newly released Kudu 1.10.0 client, the KuduScanner
> in the Java client is now iterable. Additionally the KuduScannerIterator
> will automatically make scanner keep alive calls to ensure scanners do not
> time out while iterating. This means the you can use a Java for each loop
> and all the details are handled:
>
> *for (RowResult row : scanner) {*
>
> *   .... }*
>
> If you can't use Kudu 1.10.0, it is expected that `nextRows` could be
> empty and you need to keep calling it until `scanner.hasMoreRows()` is
> empty. This often looks something like:
>
> while (scanner.hasMoreRows()) {
>    for (RowResult result : scanner.nextRows()) {
>         ...
>     }
> }
>
>
> Please keep us update on your project and progress, it looks very
> interesting!
>
> Thanks,
> Grant
>
> On Thu, Jul 11, 2019 at 11:00 AM John Mora <jhnmora000@gmail.com> wrote:
>
>> Hi all.
>>
>> I am John Mora, a GSoC student that is working with the Apache Gora
>> Community in order to implement a Kudu DataStore for Gora.
>>
>> Currently, I am having some issues with KuduScanner, so please could you
>> give some ideas of what I am doing wrong.
>>
>> I am using kudu-client for java [1] and testing my code with
>> KuduTestHarness [2].
>>
>> My code looks like this.
>>
>> List<ColumnSchema> columns = new ArrayList<>();
>> columns.add(new ColumnSchema.ColumnSchemaBuilder("pkurl",
>> Type.STRING).key(true).build());
>> columns.add(new ColumnSchema.ColumnSchemaBuilder("content",
>> Type.BINARY).nullable(true).build());
>> columns.add(new ColumnSchema.ColumnSchemaBuilder("parsedContent",
>> Type.STRING).nullable(true).build());
>>
>> List<String> keys = new ArrayList<>();
>> keys.add("pkurl");
>>
>> Schema sch = new Schema(columns);
>> CreateTableOptions cto = new CreateTableOptions();
>> cto.setRangePartitionColumns(keys);
>>
>> PartialRow lowerPar1 = sch.newPartialRow();
>> PartialRow upperPar1 = sch.newPartialRow();
>>
>> upperPar1.addString("pkurl", "http://bar.com/");
>> cto.addRangePartition(lowerPar1, upperPar1);
>>
>> PartialRow lowerPar2 = sch.newPartialRow();
>> PartialRow upperPar2 = sch.newPartialRow();
>>
>> lowerPar2.addString("pkurl", "http://bar.com/");
>> cto.addRangePartition(lowerPar2, upperPar2);
>>
>>
>> table = client.createTable(kuduMapping.getTableName(), sch, cto);
>>
>> // Insert some data using table.newInsert();
>> // {pkurl:"http://foo.com/1.html", content:[...], parsedContent:[..]}
>> // {pkurl:"http://baz.com/1.jsp&q=barbaz", content:[...],
>> parsedContent:[..]}
>> // {pkurl:"http://baz.com/1.jsp&q=barbaz&p=foo", content:[...],
>> parsedContent:[..]}
>>
>> //Scanner
>> KuduScanner.KuduScannerBuilder scannerBuilder =
>> client.newScannerBuilder(table);
>> List<String> dbFields = new ArrayList<>();
>> dbFields.add("pkurl");
>> dbFields.add("content");
>> dbFields.add("parsedContent");
>> scannerBuilder.setProjectedColumnNames(dbFields);
>> KuduScanner build = scannerBuilder.build();
>> RowResultIterator resultIt = build.nextRows();
>> //Actual: RowResultIterator is Empty
>> //Expected: RowResultIterator has 3 entries.
>>
>> I tested the same code with cto.addHashPartitions(keys, 2); instead of
>> addRangePartition.
>> And it works fine.
>>
>> Why do I get an empty result when using addRangePartition? .
>>
>>
>>
>> Cheers,
>> John
>>
>> [1] https://kudu.apache.org/docs/developing.html#_maven_artifacts
>> [2]
>> https://kudu.apache.org/docs/developing.html#_jvm_based_integration_testing
>>
>
>
> --
> Grant Henke
> Software Engineer | Cloudera
> grant@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
>


-- 
Grant Henke
Software Engineer | Cloudera
grant@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke

Mime
View raw message