flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ufuk Celebi (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (FLINK-2188) Reading from big HBase Tables
Date Fri, 12 Jun 2015 09:32:00 GMT

     [ https://issues.apache.org/jira/browse/FLINK-2188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Ufuk Celebi resolved FLINK-2188.
       Resolution: Not A Problem
    Fix Version/s: 0.9

Misconfiguration in the user code.

> Reading from big HBase Tables
> -----------------------------
>                 Key: FLINK-2188
>                 URL: https://issues.apache.org/jira/browse/FLINK-2188
>             Project: Flink
>          Issue Type: Bug
>            Reporter: Hilmi Yildirim
>            Priority: Critical
>             Fix For: 0.9
>         Attachments: flinkTest.zip
> I detected a bug in the reading from a big Hbase Table.
> I used a cluster of 13 machines with 13 processing slots for each machine which results
in a total number of processing slots of 169. Further, our cluster uses cdh5.4.1 and the HBase
version is 1.0.0-cdh5.4.1. There is a Hbase Table with nearly 100. mio rows. I used Spark
and Hive to count the number of rows and both results are identical (nearly 100 mio.). 
> Then, I used Flink to count the number of rows. For that I added the hbase-client 1.0.0-cdh5.4.1
Java API as dependency in maven and excluded the other hbase-client dependencies. The result
of the job is nearly 102 mio. , 2 mio. rows more than the result of Spark and Hive. Moreover,
I run the Flink job multiple times and sometimes the result fluctuates by +-5.

This message was sent by Atlassian JIRA

View raw message