Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
MIME-Version: 1.0
In-Reply-To: <CAB-_WnGhJ6pAZYAgAn2SsC0PGpSCDYKQ7LqU8dek=FNdzP2rKQ@mail.gmail.com>
References: <CAB-_WnGCLv-aL0FSAGceVUkHqx=r5Q+XTnB5PrG11Sk=gYQX0w@mail.gmail.com>
 <CA+DCeTHELjyU=0eD-j+fiBBqQXLzJLiBZDiG4USTRpZ5vK9Vdw@mail.gmail.com>
 <CAAg3a2rJ5aVeYB7P2a-HHU2Z42nsxrU-D36cvf9StpqPjrRXSg@mail.gmail.com>
 <CAB-_WnEnDJTMUTKrMgEpUag7dcoCG1e1TrYmmW=qxM2vcSVxCw@mail.gmail.com>
 <CAAg3a2qTAT4My7mkars53JPcwvNujSTNSUpvbz=fAGkmp-yXHA@mail.gmail.com>
 <CA+DCeTH_9UNTCEuOATXo3YOZq83Rw7hSVVTO_6KiMjVg35SMqg@mail.gmail.com>
 <CAB-_WnHKE8_zptxMWFwYALw6C_7J8GrBb32jG+L62nnY5WM79g@mail.gmail.com>
 <CA+DCeTHzE4XMcBovWenE4OAkT_bMTyfPfznQwdF1gP13RPMZJw@mail.gmail.com>
 <CAB-_WnEch_8_NhmaQdJtPn3x_C3kjqGgM4mji4p0Kv9f7WACRA@mail.gmail.com>
 <CA+DCeTFrZjNA96j5AQiVrYOqFkHr_QCHCr9kDvaCOJM3EVOmvg@mail.gmail.com> <CAB-_WnGhJ6pAZYAgAn2SsC0PGpSCDYKQ7LqU8dek=FNdzP2rKQ@mail.gmail.com>
From: Mikhail Antonov <antonov@apache.org>
Date: Sat, 20 Aug 2016 23:26:00 -0700
Message-ID: <CAHxLZBX=eQ+4C6Zi+=mevTzAUFGTN448Cqtx13iJotzmHrKymg@mail.gmail.com>
Subject: Re: Hbase federated cluster for messages
To: "user@hbase.apache.org" <user@hbase.apache.org>
Content-Type: multipart/alternative; boundary=001a113c51c6283d63053a8f0301
archived-at: Sun, 21 Aug 2016 06:26:38 -0000

--001a113c51c6283d63053a8f0301
Content-Type: text/plain; charset=UTF-8

Just out of curiosity, is there anything particular about your deployment
or usecase that raised this specific concern about Namenode performance?

HDFS clusters with 80 datanodes shall be considered medium-sized; there are
plenty of (much) bigger clusters out there in the field;
and HBase clusters with 80 nodes aren't something very uncommon either.
Fine tuning cluster of this size according to specific workload
would certainly require some planning and work and setting a bunch of
params related to heap/memstore/block cache sizing,
GC settings, RPC scheduler settings, replication settings and a bunch of
other things; but why the Namenode?

-Mikhail

On Sat, Aug 20, 2016 at 6:39 AM, Alexandr Porunov <
alexandr.porunov@gmail.com> wrote:

> Thank you Dima
>
> Best regards,
> Alexandr
>
> On Sat, Aug 20, 2016 at 4:17 PM, Dima Spivak <dspivak@cloudera.com> wrote:
>
> > Yup.
> >
> > On Saturday, August 20, 2016, Alexandr Porunov <
> alexandr.porunov@gmail.com
> > >
> > wrote:
> >
> > > So, will it be ok if we have 80 data nodes (8TB on each node) and only
> > one
> > > namenode? Will it works for the messaging system? We will have 2x
> > > replication so there are 320 TB of data (per year) (640 TB with
> > > replication). 130000 R+W ops/sec. Each message 100 bytes or 1024 bytes.
> > > Is it possible to handle such load with hbase?
> > >
> > > Sincerely,
> > > Alexandr
> > >
> > > On Sat, Aug 20, 2016 at 8:44 AM, Dima Spivak <dspivak@cloudera.com
> > > <javascript:;>> wrote:
> > >
> > > > You can easily store that much data as long as you don't have small
> > > files,
> > > > which is typically why people turn to federation.
> > > >
> > > > -Dima
> > > >
> > > > On Friday, August 19, 2016, Alexandr Porunov <
> > alexandr.porunov@gmail.com
> > > <javascript:;>>
> > > > wrote:
> > > >
> > > > > We are talking about facebook. So, there are 25 TB per month. 15
> > > billion
> > > > > messages with 1024 bytes and 120 billion messages with 100 bytes
> per
> > > > month.
> > > > >
> > > > > I thought that they used only hbase to handle such a huge data If
> > they
> > > > used
> > > > > their own implementation of hbase then I haven't questions.
> > > > >
> > > > > Sincerely,
> > > > > Alexandr
> > > > >
> > > > > On Sat, Aug 20, 2016 at 1:39 AM, Dima Spivak <dspivak@cloudera.com
> > > <javascript:;>
> > > > > <javascript:;>> wrote:
> > > > >
> > > > > > I'd +1 what Vladimir says. How much data (in TBs/PBs) and how
> many
> > > > files
> > > > > > are we talking about here? I'd say that use cases that benefit
> from
> > > > HBase
> > > > > > don't tend to hit the kind of HDFS file limits that federation
> > seeks
> > > to
> > > > > > address.
> > > > > >
> > > > > > -Dima
> > > > > >
> > > > > > On Fri, Aug 19, 2016 at 2:19 PM, Vladimir Rodionov <
> > > > > vladrodionov@gmail.com <javascript:;> <javascript:;>
> > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > FB has its own "federation". It is a proprietary code, I
> presume.
> > > > > > >
> > > > > > > -Vladimir
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Aug 19, 2016 at 1:22 PM, Alexandr Porunov <
> > > > > > > alexandr.porunov@gmail.com <javascript:;> <javascript:;>>
> wrote:
> > > > > > >
> > > > > > > > No. There isn't. But I want to figure out how to configure
> that
> > > > type
> > > > > of
> > > > > > > > cluster in the case if there is particular reason. How
> facebook
> > > can
> > > > > > > handle
> > > > > > > > such a huge amount of ops without federation? I don't think
> > that
> > > > they
> > > > > > > just
> > > > > > > > have one namenode server and one standby namenode server. It
> > > isn't
> > > > > > > > possible. I am sure that they use federation.
> > > > > > > >
> > > > > > > > On Fri, Aug 19, 2016 at 10:08 PM, Vladimir Rodionov <
> > > > > > > > vladrodionov@gmail.com <javascript:;> <javascript:;>>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > >> I am not sure how to do it but I have to configure
> > federated
> > > > > > cluster
> > > > > > > > > with
> > > > > > > > > >> hbase to store huge amount of messages (client to
> client)
> > > (40%
> > > > > > > writes,
> > > > > > > > > 60%
> > > > > > > > > >> reads).
> > > > > > > > >
> > > > > > > > > Any particular reason for federated cluster? How huge is
> huge
> > > > > amount
> > > > > > > and
> > > > > > > > > what is the message size?
> > > > > > > > >
> > > > > > > > > -Vladimir
> > > > > > > > >
> > > > > > > > > On Fri, Aug 19, 2016 at 11:57 AM, Dima Spivak <
> > > > > dspivak@cloudera.com <javascript:;> <javascript:;>>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > As far as I know, HBase doesn't support spreading tables
> > > across
> > > > > > > > > namespaces;
> > > > > > > > > > you'd have to point it at one namenode at a time. I've
> > heard
> > > of
> > > > > > > people
> > > > > > > > > > trying to run multiple HBase instances in order to get
> > access
> > > > to
> > > > > > all
> > > > > > > > > their
> > > > > > > > > > HDFS data, but it doesn't tend to be much fun.
> > > > > > > > > >
> > > > > > > > > > -Dima
> > > > > > > > > >
> > > > > > > > > > On Fri, Aug 19, 2016 at 11:51 AM, Alexandr Porunov <
> > > > > > > > > > alexandr.porunov@gmail.com <javascript:;>
> <javascript:;>>
> > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hello,
> > > > > > > > > > >
> > > > > > > > > > > I am not sure how to do it but I have to configure
> > > federated
> > > > > > > cluster
> > > > > > > > > with
> > > > > > > > > > > hbase to store huge amount of messages (client to
> client)
> > > > (40%
> > > > > > > > writes,
> > > > > > > > > > 60%
> > > > > > > > > > > reads). Does somebody have any idea or examples how to
> > > > > configure
> > > > > > > it?
> > > > > > > > > > >
> > > > > > > > > > > Of course we can configure hdfs in a federated mode but
> > as
> > > > for
> > > > > me
> > > > > > > it
> > > > > > > > > > isn't
> > > > > > > > > > > suitable for hbase. If we want to save message from
> > client
> > > 1
> > > > to
> > > > > > > > client
> > > > > > > > > 2
> > > > > > > > > > in
> > > > > > > > > > > the hbase cluster then how hbase know in which
> namespace
> > it
> > > > > have
> > > > > > to
> > > > > > > > > save
> > > > > > > > > > > it? Which namenode will be responsible for that
> message?
> > > How
> > > > we
> > > > > > can
> > > > > > > > > read
> > > > > > > > > > > client messages?
> > > > > > > > > > >
> > > > > > > > > > > Give me any ideas, please
> > > > > > > > > > >
> > > > > > > > > > > Sincerely,
> > > > > > > > > > > Alexandr
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > -Dima
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > -Dima
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > -Dima
> > > >
> > >
> >
> >
> > --
> > -Dima
> >
>

--001a113c51c6283d63053a8f0301--