hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mikhail Antonov <anto...@apache.org>
Subject Re: Hbase federated cluster for messages
Date Sun, 21 Aug 2016 06:26:00 GMT
Just out of curiosity, is there anything particular about your deployment
or usecase that raised this specific concern about Namenode performance?

HDFS clusters with 80 datanodes shall be considered medium-sized; there are
plenty of (much) bigger clusters out there in the field;
and HBase clusters with 80 nodes aren't something very uncommon either.
Fine tuning cluster of this size according to specific workload
would certainly require some planning and work and setting a bunch of
params related to heap/memstore/block cache sizing,
GC settings, RPC scheduler settings, replication settings and a bunch of
other things; but why the Namenode?

-Mikhail

On Sat, Aug 20, 2016 at 6:39 AM, Alexandr Porunov <
alexandr.porunov@gmail.com> wrote:

> Thank you Dima
>
> Best regards,
> Alexandr
>
> On Sat, Aug 20, 2016 at 4:17 PM, Dima Spivak <dspivak@cloudera.com> wrote:
>
> > Yup.
> >
> > On Saturday, August 20, 2016, Alexandr Porunov <
> alexandr.porunov@gmail.com
> > >
> > wrote:
> >
> > > So, will it be ok if we have 80 data nodes (8TB on each node) and only
> > one
> > > namenode? Will it works for the messaging system? We will have 2x
> > > replication so there are 320 TB of data (per year) (640 TB with
> > > replication). 130000 R+W ops/sec. Each message 100 bytes or 1024 bytes.
> > > Is it possible to handle such load with hbase?
> > >
> > > Sincerely,
> > > Alexandr
> > >
> > > On Sat, Aug 20, 2016 at 8:44 AM, Dima Spivak <dspivak@cloudera.com
> > > <javascript:;>> wrote:
> > >
> > > > You can easily store that much data as long as you don't have small
> > > files,
> > > > which is typically why people turn to federation.
> > > >
> > > > -Dima
> > > >
> > > > On Friday, August 19, 2016, Alexandr Porunov <
> > alexandr.porunov@gmail.com
> > > <javascript:;>>
> > > > wrote:
> > > >
> > > > > We are talking about facebook. So, there are 25 TB per month. 15
> > > billion
> > > > > messages with 1024 bytes and 120 billion messages with 100 bytes
> per
> > > > month.
> > > > >
> > > > > I thought that they used only hbase to handle such a huge data If
> > they
> > > > used
> > > > > their own implementation of hbase then I haven't questions.
> > > > >
> > > > > Sincerely,
> > > > > Alexandr
> > > > >
> > > > > On Sat, Aug 20, 2016 at 1:39 AM, Dima Spivak <dspivak@cloudera.com
> > > <javascript:;>
> > > > > <javascript:;>> wrote:
> > > > >
> > > > > > I'd +1 what Vladimir says. How much data (in TBs/PBs) and how
> many
> > > > files
> > > > > > are we talking about here? I'd say that use cases that benefit
> from
> > > > HBase
> > > > > > don't tend to hit the kind of HDFS file limits that federation
> > seeks
> > > to
> > > > > > address.
> > > > > >
> > > > > > -Dima
> > > > > >
> > > > > > On Fri, Aug 19, 2016 at 2:19 PM, Vladimir Rodionov <
> > > > > vladrodionov@gmail.com <javascript:;> <javascript:;>
> > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > FB has its own "federation". It is a proprietary code,
I
> presume.
> > > > > > >
> > > > > > > -Vladimir
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Aug 19, 2016 at 1:22 PM, Alexandr Porunov <
> > > > > > > alexandr.porunov@gmail.com <javascript:;> <javascript:;>>
> wrote:
> > > > > > >
> > > > > > > > No. There isn't. But I want to figure out how to configure
> that
> > > > type
> > > > > of
> > > > > > > > cluster in the case if there is particular reason.
How
> facebook
> > > can
> > > > > > > handle
> > > > > > > > such a huge amount of ops without federation? I don't
think
> > that
> > > > they
> > > > > > > just
> > > > > > > > have one namenode server and one standby namenode
server. It
> > > isn't
> > > > > > > > possible. I am sure that they use federation.
> > > > > > > >
> > > > > > > > On Fri, Aug 19, 2016 at 10:08 PM, Vladimir Rodionov
<
> > > > > > > > vladrodionov@gmail.com <javascript:;> <javascript:;>>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > >> I am not sure how to do it but I have
to configure
> > federated
> > > > > > cluster
> > > > > > > > > with
> > > > > > > > > >> hbase to store huge amount of messages
(client to
> client)
> > > (40%
> > > > > > > writes,
> > > > > > > > > 60%
> > > > > > > > > >> reads).
> > > > > > > > >
> > > > > > > > > Any particular reason for federated cluster?
How huge is
> huge
> > > > > amount
> > > > > > > and
> > > > > > > > > what is the message size?
> > > > > > > > >
> > > > > > > > > -Vladimir
> > > > > > > > >
> > > > > > > > > On Fri, Aug 19, 2016 at 11:57 AM, Dima Spivak
<
> > > > > dspivak@cloudera.com <javascript:;> <javascript:;>>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > As far as I know, HBase doesn't support
spreading tables
> > > across
> > > > > > > > > namespaces;
> > > > > > > > > > you'd have to point it at one namenode at
a time. I've
> > heard
> > > of
> > > > > > > people
> > > > > > > > > > trying to run multiple HBase instances in
order to get
> > access
> > > > to
> > > > > > all
> > > > > > > > > their
> > > > > > > > > > HDFS data, but it doesn't tend to be much
fun.
> > > > > > > > > >
> > > > > > > > > > -Dima
> > > > > > > > > >
> > > > > > > > > > On Fri, Aug 19, 2016 at 11:51 AM, Alexandr
Porunov <
> > > > > > > > > > alexandr.porunov@gmail.com <javascript:;>
> <javascript:;>>
> > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hello,
> > > > > > > > > > >
> > > > > > > > > > > I am not sure how to do it but I have
to configure
> > > federated
> > > > > > > cluster
> > > > > > > > > with
> > > > > > > > > > > hbase to store huge amount of messages
(client to
> client)
> > > > (40%
> > > > > > > > writes,
> > > > > > > > > > 60%
> > > > > > > > > > > reads). Does somebody have any idea
or examples how to
> > > > > configure
> > > > > > > it?
> > > > > > > > > > >
> > > > > > > > > > > Of course we can configure hdfs in
a federated mode but
> > as
> > > > for
> > > > > me
> > > > > > > it
> > > > > > > > > > isn't
> > > > > > > > > > > suitable for hbase. If we want to save
message from
> > client
> > > 1
> > > > to
> > > > > > > > client
> > > > > > > > > 2
> > > > > > > > > > in
> > > > > > > > > > > the hbase cluster then how hbase know
in which
> namespace
> > it
> > > > > have
> > > > > > to
> > > > > > > > > save
> > > > > > > > > > > it? Which namenode will be responsible
for that
> message?
> > > How
> > > > we
> > > > > > can
> > > > > > > > > read
> > > > > > > > > > > client messages?
> > > > > > > > > > >
> > > > > > > > > > > Give me any ideas, please
> > > > > > > > > > >
> > > > > > > > > > > Sincerely,
> > > > > > > > > > > Alexandr
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > -Dima
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > -Dima
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > -Dima
> > > >
> > >
> >
> >
> > --
> > -Dima
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message