flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Re: HBase 0.7.0 addon
Date Sat, 01 Nov 2014 19:17:27 GMT
Hi Flavio!

Here are a few comments:

 - Concerning the count operator: I think we can hack this in a very simple
way. Would be good to spend a few thought cycles on keeping the API
consistent, though. Flink does not pull data back to the client as eagerly
as Spark, but leaves it in the cluster more. That has paid off in various
situations. Let me draft a proposal how to include such operations in the
next days. I think we can have this very soon.

 - Concerning the Region Splitting: Can you elaborate a little bit on that
and give a few more details about the problem? In general, the input
splitting in Flink happens when the job is started and the splits are
dynamically assigned to the sources as the job runs. You can customize all
that behavior by overwriting the two methods "createInputSplits" and
"getInputSplitAssigner" in the input format.

 - Concerning the pull request: There are sometimes build stalls on Travis
that no one has encountered outside Travis so far. Not exactly sure what
causes them, but if that happens for one build and the others work, I would
consider the pull request passed.


Greetings,
Stephan



On Sat, Nov 1, 2014 at 2:03 PM, Flavio Pompermaier <pompermaier@okkam.it>
wrote:

> My pul;l request seems to build correctly right now, except a case
> (PROFILE="-Dhadoop.profile=2 -Dhadoop.version=2.2.0") where Travis stops
> the job during the tests saying:
>
> No output has been received in the last 10 minutes, this potentially
> indicates a stalled build or something wrong with the build itself. The
> build has been terminated
>
> Can someone help me finalizing this PR? I also removed some classes that I
> think were obsolete right now (i.e. GenericTableOutputFormat,HBaseUtil
> and HBaseDataSink).
>
>
> On Fri, Oct 31, 2014 at 5:04 PM, Flavio Pompermaier <pompermaier@okkam.it>
> wrote:
>
>> The current implementation of HBase splitting policy cannot deal with
>> region splitting during the job execution.
>> Do you think it is possible to overcome this issue?
>>
>> On Fri, Oct 31, 2014 at 2:22 PM, Flavio Pompermaier <pompermaier@okkam.it
>> > wrote:
>>
>>> Is it far from being released this feature?
>>>
>>> On Fri, Oct 31, 2014 at 1:51 PM, Kostas Tzoumas <ktzoumas@apache.org>
>>> wrote:
>>>
>>>> I was wrong. This feature is actually coming up and tracked here:
>>>> https://issues.apache.org/jira/browse/FLINK-758
>>>>
>>>> On Fri, Oct 31, 2014 at 1:14 PM, Flavio Pompermaier <
>>>> pompermaier@okkam.it> wrote:
>>>>
>>>>> For this I don't have time, we're working on upgrade HBase to 0.98
>>>>> APIs (and it's already working :))
>>>>> However we should discuss about how to manage properly the version of
>>>>> hbase and its hadoop dependencies..
>>>>>
>>>>> Best,
>>>>> Flavio
>>>>>
>>>>> On Fri, Oct 31, 2014 at 11:32 AM, Kostas Tzoumas <ktzoumas@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Agreed 100%.
>>>>>>
>>>>>> I created a JIRA for this:
>>>>>> https://issues.apache.org/jira/browse/FLINK-1200
>>>>>>
>>>>>> Flavio, would you like to give it a go? Otherwise I will assign it
to
>>>>>> myself
>>>>>>
>>>>>> On Fri, Oct 31, 2014 at 10:12 AM, Flavio Pompermaier <
>>>>>> pompermaier@okkam.it> wrote:
>>>>>>
>>>>>>> I think that a count operator is very useful for people wanting
to
>>>>>>> run an HelloWorld with Flink,
>>>>>>> it's always the first test I do (and with Spark that is very
easy..)
>>>>>>>
>>>>>>> Best,
>>>>>>> Flavio
>>>>>>>
>>>>>>> On Fri, Oct 31, 2014 at 9:57 AM, Fabian Hueske <fhueske@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Flavio,
>>>>>>>>
>>>>>>>> right now, there is no dedicated count operator in the API.
>>>>>>>> You can do the work-around with appending a 1 and summing
it up
>>>>>>>> (see Wordcount example [1]).
>>>>>>>> This is also what a dedicated count operator would internally
do.
>>>>>>>>
>>>>>>>> It would be awesome to get some contributions for the HBase
addon
>>>>>>>> :-)
>>>>>>>>
>>>>>>>> Best, Fabian
>>>>>>>>
>>>>>>>> [1]
>>>>>>>> http://flink.incubator.apache.org/docs/0.7-incubating/examples.html
>>>>>>>>
>>>>>>>> 2014-10-31 9:46 GMT+01:00 Flavio Pompermaier <pompermaier@okkam.it>
>>>>>>>> :
>>>>>>>>
>>>>>>>>> We are trying to connect to HBase 0.98 so we'll probably
>>>>>>>>> contribute to the HBase addon :)
>>>>>>>>> Is there a count API for Dataset? What is the fastest
way to run a
>>>>>>>>> count on a dataset?
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Flavio
>>>>>>>>>
>>>>>>>>> On Fri, Oct 31, 2014 at 6:19 AM, Robert Metzger <
>>>>>>>>> rmetzger@apache.org> wrote:
>>>>>>>>>
>>>>>>>>>> Okay, I've deployed the missing artifacts to maven
central. it
>>>>>>>>>> will take some hours until they are synchronized.
>>>>>>>>>> The example in the "flink-hbase" module is still
using the old
>>>>>>>>>> Java API.
>>>>>>>>>> But you should be able to use the Hbase Input format
like this:
>>>>>>>>>>         ExecutionEnvironment ee =
>>>>>>>>>> ExecutionEnvironment.getExecutionEnvironment();
>>>>>>>>>>         DataSet<Record> t = ee.createInput(new
>>>>>>>>>> MyTableInputFormat());
>>>>>>>>>>
>>>>>>>>>> I think the Flink Hbase module is not very well-tested,
so its
>>>>>>>>>> likely that you'll find issues while using it.
>>>>>>>>>>
>>>>>>>>>> The only documentation on logging we have is this
one:
>>>>>>>>>> http://flink.incubator.apache.org/docs/0.7-incubating/internal_logging.html
>>>>>>>>>>
>>>>>>>>>> Are you only seeing the log messages from Flink or
no messages at
>>>>>>>>>> all?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Oct 30, 2014 at 4:10 PM, Flavio Pompermaier
<
>>>>>>>>>> pompermaier@okkam.it> wrote:
>>>>>>>>>>
>>>>>>>>>>> Ok thanks!I was trying to run a mapreduce flink
job using an
>>>>>>>>>>> hbase dataset but I wasn't able to make it run
it locally. The one in the
>>>>>>>>>>> addons just specify a plan but it does not say
how to test it.
>>>>>>>>>>> Moreover I tried to put a log4j.properties in
the classpath to
>>>>>>>>>>> debug what's going on but I can't see any debug
info. Do you have any
>>>>>>>>>>> hook/guide?
>>>>>>>>>>> On Oct 30, 2014 11:58 PM, "Robert Metzger" <rmetzger@apache.org>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> No, there is no reason for that. It actually
seems like
>>>>>>>>>>>> something went wrong while releasing Flink
0.7.0. I'll deploy the missing
>>>>>>>>>>>> artifacts.
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Oct 30, 2014 at 9:26 AM, Flavio Pompermaier
<
>>>>>>>>>>>> pompermaier@okkam.it> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi to all,
>>>>>>>>>>>>> is there a reason why the 0.7.0 hbase
addons is not deployed
>>>>>>>>>>>>> on maven central?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>> Flavio
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>

Mime
View raw message