flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@apache.org>
Subject Re: HBase 0.98 addon for Flink 0.8
Date Mon, 03 Nov 2014 10:51:06 GMT
Hi Flavio

let me try to answer your last question on the user's list (to the best of
my HBase knowledge).
"I just wanted to known if and how regiom splitting is handled. Can you
explain me in detail how Flink and HBase works?what is not fully clear to
me is when computation is done by region servers and when data start flow
to a Flink worker (that in ky test job is only my pc) and how ro undertsand
better the important logged info to understand if my job is performing well"

HBase partitions its tables into so called "regions" of keys and stores the
regions distributed in the cluster using HDFS. I think an HBase region can
be thought of as a HDFS block. To make reading an HBase table efficient,
region reads should be locally done, i.e., an InputFormat should primarily
read region that are stored on the same machine as the IF is running on.
Flink's InputSplits partition the HBase input by regions and add
information about the storage location of the region. During execution,
input splits are assigned to InputFormats that can do local reads.

Best, Fabian

2014-11-03 11:13 GMT+01:00 Stephan Ewen <sewen@apache.org>:

> Hi!
>
> The way of passing parameters through the configuration is very old (the
> original HBase format dated back to that time). I would simply make the
> HBase format take those parameters through the constructor.
>
> Greetings,
> Stephan
>
>
> On Mon, Nov 3, 2014 at 10:59 AM, Flavio Pompermaier <pompermaier@okkam.it>
> wrote:
>
> > The problem is that I also removed the GenericTableOutputFormat because
> > there is an incompatibility between hadoop1 and hadoop2 for class
> > TaskAttemptContext and TaskAttemptContextImpl..
> > then it would be nice if the user doesn't have to worry about passing
> > pact.hbase.jtkey and pact.job.id parameters..
> > I think it is probably a good idea to remove hadoop1 compatibility and
> keep
> > enable HBase addon only for hadoop2 (as before) and decide how to mange
> > those 2 parameters..
> >
> > On Mon, Nov 3, 2014 at 10:19 AM, Stephan Ewen <sewen@apache.org> wrote:
> >
> > > It is fine to remove it, in my opinion.
> > >
> > > On Mon, Nov 3, 2014 at 10:11 AM, Flavio Pompermaier <
> > pompermaier@okkam.it>
> > > wrote:
> > >
> > > > That is one class I removed because it was using the deprecated API
> > > > GenericDataSink..I can restore them but the it will be a good idea to
> > > > remove those warning (also because from what I understood the Record
> > APIs
> > > > are going to be removed).
> > > >
> > > > On Mon, Nov 3, 2014 at 9:51 AM, Fabian Hueske <fhueske@apache.org>
> > > wrote:
> > > >
> > > > > I'm not familiar with the HBase connector code, but are you maybe
> > > looking
> > > > > for the GenericTableOutputFormat?
> > > > >
> > > > > 2014-11-03 9:44 GMT+01:00 Flavio Pompermaier <pompermaier@okkam.it
> >:
> > > > >
> > > > > > | was trying to modify the example setting hbaseDs.output(new
> > > > > > HBaseOutputFormat()); but I can't see any HBaseOutputFormat
> > > > class..maybe
> > > > > we
> > > > > > shall use another class?
> > > > > >
> > > > > > On Mon, Nov 3, 2014 at 9:39 AM, Flavio Pompermaier <
> > > > pompermaier@okkam.it
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Maybe that's something I could add to the HBase example
and
> that
> > > > could
> > > > > be
> > > > > > > better documented in the Wiki.
> > > > > > >
> > > > > > > Since we're talking about the wiki..I was looking at the
Java
> > API (
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://flink.incubator.apache.org/docs/0.6-incubating/java_api_guide.html)
> > > > > > > and the link to the KMeans example is not working (where
it
> says
> > > For
> > > > a
> > > > > > > complete example program, have a look at KMeans Algorithm).
> > > > > > >
> > > > > > > Best,
> > > > > > > Flavio
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Nov 3, 2014 at 9:12 AM, Flavio Pompermaier <
> > > > > pompermaier@okkam.it
> > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > >> Ah ok, perfect! That was the reason why I removed it
:)
> > > > > > >>
> > > > > > >> On Mon, Nov 3, 2014 at 9:10 AM, Stephan Ewen <
> sewen@apache.org>
> > > > > wrote:
> > > > > > >>
> > > > > > >>> You do not really need a HBase data sink. You can
call
> > > > > > >>> "DataSet.output(new
> > > > > > >>> HBaseOutputFormat())"
> > > > > > >>>
> > > > > > >>> Stephan
> > > > > > >>> Am 02.11.2014 23:05 schrieb "Flavio Pompermaier"
<
> > > > > pompermaier@okkam.it
> > > > > > >:
> > > > > > >>>
> > > > > > >>> > Just one last thing..I removed the HbaseDataSink
because I
> > > think
> > > > it
> > > > > > was
> > > > > > >>> > using the old APIs..can someone help me in
updating that
> > class?
> > > > > > >>> >
> > > > > > >>> > On Sun, Nov 2, 2014 at 10:55 AM, Flavio Pompermaier
<
> > > > > > >>> pompermaier@okkam.it>
> > > > > > >>> > wrote:
> > > > > > >>> >
> > > > > > >>> > > Indeed this time the build has been successful
:)
> > > > > > >>> > >
> > > > > > >>> > > On Sun, Nov 2, 2014 at 10:29 AM, Fabian
Hueske <
> > > > > fhueske@apache.org
> > > > > > >
> > > > > > >>> > wrote:
> > > > > > >>> > >
> > > > > > >>> > >> You can also setup Travis to build
your own Github
> > > > repositories
> > > > > by
> > > > > > >>> > linking
> > > > > > >>> > >> it to your Github account. That way
Travis can build all
> > > your
> > > > > > >>> branches
> > > > > > >>> > >> (and
> > > > > > >>> > >> you can also trigger rebuilds if
something fails).
> > > > > > >>> > >> Not sure if we can manually trigger
retrigger builds on
> > the
> > > > > Apache
> > > > > > >>> > >> repository.
> > > > > > >>> > >>
> > > > > > >>> > >> Support for Hadoop 1 and 2 is indeed
a very good
> addition
> > > :-)
> > > > > > >>> > >>
> > > > > > >>> > >> For the discusion about the PR itself,
I would need a
> bit
> > > more
> > > > > > time
> > > > > > >>> to
> > > > > > >>> > >> become more familiar with HBase.
I do also not have a
> > HBase
> > > > > setup
> > > > > > >>> > >> available
> > > > > > >>> > >> here.
> > > > > > >>> > >> Maybe somebody else of the community
who was involved
> > with a
> > > > > > >>> previous
> > > > > > >>> > >> version of the HBase connector could
comment on your
> > > question.
> > > > > > >>> > >>
> > > > > > >>> > >> Best, Fabian
> > > > > > >>> > >>
> > > > > > >>> > >> 2014-11-02 9:57 GMT+01:00 Flavio
Pompermaier <
> > > > > > pompermaier@okkam.it
> > > > > > >>> >:
> > > > > > >>> > >>
> > > > > > >>> > >> > As suggestes by Fabian I moved
the discussion on this
> > > > mailing
> > > > > > >>> list.
> > > > > > >>> > >> >
> > > > > > >>> > >> > I think that what is still to
be discussed is how  to
> > > > > retrigger
> > > > > > >>> the
> > > > > > >>> > >> build
> > > > > > >>> > >> > on Travis (I don't have an account)
and if the PR can
> be
> > > > > > >>> integrated.
> > > > > > >>> > >> >
> > > > > > >>> > >> > Maybe what I can do is to move
the HBase example in
> the
> > > test
> > > > > > >>> package
> > > > > > >>> > >> (right
> > > > > > >>> > >> > now I left it in the main folder)
so it will force
> > Travis
> > > to
> > > > > > >>> rebuild.
> > > > > > >>> > >> > I'll do it within a couple of
hours.
> > > > > > >>> > >> >
> > > > > > >>> > >> > Another thing I forgot to say
is that the hbase
> > extension
> > > is
> > > > > now
> > > > > > >>> > >> compatible
> > > > > > >>> > >> > with both hadoop 1 and 2.
> > > > > > >>> > >> >
> > > > > > >>> > >> > Best,
> > > > > > >>> > >> > Flavio
> > > > > > >>> > >>
> > > > > > >>> > >
> > > > > > >>> >
> > > > > > >>>
> > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message