flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@gmail.com>
Subject Re: Reading from HBase problem
Date Mon, 08 Jun 2015 21:29:57 GMT
Hi Hilmi,

I see two possible reasons:

1) The data source / InputFormat is not properly working, so not all HBase
records are read/forwarded, or
2) The aggregation / count is buggy

Roberts suggestion will use an alternative mechanism to do the count. In
fact, you can count with groupBy(0).sum() and accumulators at the same time.
If both counts are the same, this will indicate that the aggregation is
correct and hint that the HBase format is faulty.

In any case, it would be very good to know your findings. Please keep us
updated.

One more hint, if you want to do a full aggregate, you don't have to use a
"dummy" key like "a". Instead, you can work with Tuple1<Long> and directly
call sum(0) without doing the groupBy().

Best, Fabian

2015-06-08 17:36 GMT+02:00 Robert Metzger <rmetzger@apache.org>:

> Hi Hilmi,
>
> if you just want to count the number of elements, you can also use
> accumulators, as described here [1].
> They are much more lightweight.
>
> So you need to make your flatMap function a RichFlatMapFunction, then call
> getExecutionContext().
> Use a long accumulator to count the elements.
>
> If the results with the accumulator are consistent (the exact element
> count), then there is a severe bug in Flink. But I suspect that the
> accumulator will give you the same result (off by +-5)
>
> Best,
> Robert
>
>
> [1]: http://slideshare.net/robertmetzger1/apache-flink-hands-on
>
> On Mon, Jun 8, 2015 at 3:04 PM, Hilmi Yildirim <hilmi.yildirim@neofonie.de
> > wrote:
>
>> Hi,
>> I implemented a simple Flink Batch job which reads from an HBase Cluster
>> of 13 machines and with nearly 100 million rows. The hbase version is
>> 1.0.0-cdh5.4.1. So, I imported hbase-client 1.0.0-cdh5.4.1.
>> I implemented a flatmap which creates a tuple ("a", 1L) for each row .
>> Then, I use groupBy(0).sum(1).writeAsTest. The result should be the number
>> of rows. But, the result is not correct. I run the job multiple times and
>> the result flactuates by +-5. I also run the job for a smaller table with
>> 100.000 rows and the result is correct.
>>
>> Does anyone know the reason for that?
>>
>> Best Regards,
>> Hilmi
>>
>> --
>> --
>> Hilmi Yildirim
>> Software Developer R&D
>>
>> http://www.neofonie.de
>>
>> Besuchen Sie den Neo Tech Blog für Anwender:
>> http://blog.neofonie.de/
>>
>> Folgen Sie uns:
>> https://plus.google.com/+neofonie
>> http://www.linkedin.com/company/neofonie-gmbh
>> https://www.xing.com/companies/neofoniegmbh
>>
>> Neofonie GmbH | Robert-Koch-Platz 4 | 10115 Berlin
>> Handelsregister Berlin-Charlottenburg: HRB 67460
>> Geschäftsführung: Thomas Kitlitschko
>>
>>
>

Mime
View raw message