hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dima Spivak <dimaspi...@apache.org>
Subject Re: Hbase on docker container with persistent storage
Date Wed, 19 Jul 2017 13:52:23 GMT
I've run HDFS/HBase in Docker containers across a handful of hosts while
working on changes to the clusterdock project [1]. More often, though, I've
worked with multiple Docker containers on a single machine (albeit with
lots of storage) to test the components.

1. https://github.com/clusterdock/

-Dima

On Tue, Jul 18, 2017 at 9:52 PM, Udbhav Agarwal <udbhav.agarwal@syncoms.com>
wrote:

> Okay, at which scale you have experience with ?
>
> -----Original Message-----
> From: Dima Spivak [mailto:dimaspivak@apache.org]
> Sent: Monday, July 17, 2017 7:40 PM
> To: user@hbase.apache.org
> Subject: Re: Hbase on docker container with persistent storage
>
> No, not at the scale you're looking at.
>
> On Mon, Jul 17, 2017 at 6:36 AM Udbhav Agarwal <udbhav.agarwal@syncoms.com
> >
> wrote:
>
> > Hi Dima,
> > I am unable to containeriz HDFS till now. Do you have any reference
> > which I can use to go ahead with that ?
> >
> > Thanks,
> > Udbhav
> >
> > -----Original Message-----
> > From: Dima Spivak [mailto:dimaspivak@apache.org]
> > Sent: Monday, July 17, 2017 6:37 PM
> > To: user@hbase.apache.org
> > Subject: Re: Hbase on docker container with persistent storage
> >
> > Hi Udbhav,
> >
> > How have you containerized HDFS to run on Docker across 80 hosts? The
> > answer to that would guide how you might add HBase into the mix.
> >
> > On Mon, Jul 17, 2017 at 5:33 AM Udbhav Agarwal
> > <udbhav.agarwal@syncoms.com
> > >
> > wrote:
> >
> > > Hi Dima,
> > > Hope you are doing well.
> > > Using hbase on a single host is performant because now I am not
> > > dealing with Terabytes of data. For now data size is very
> > > less.(around
> > > 1 gb). This setup I am using to test my application.
> > >                As a next step I have to grow the data as well as
> > > storage and check performance. So I will need to use hbase deployed
> > > on
> > > 70-80 servers.
> > >                Now can you please let me know how can I containerize
> > > hbase so as to be able to use hbase backed by hdfs using 70-80 host
> > > machines and not loose data if the container itself dies due to some
> > reason?
> > >
> > > Thanks,
> > > Udbhav
> > >
> > > From: Dima Spivak [mailto:dimaspivak@apache.org]
> > > Sent: Friday, July 14, 2017 10:11 PM
> > > To: Udbhav Agarwal <udbhav.agarwal@syncoms.com>;
> > > user@hbase.apache.org
> > > Cc: dimaspivak@apache.org
> > > Subject: Re: Hbase on docker container with persistent storage
> > >
> > > If running HBase on a single host is performant enough for you, why
> > > use HBase at all? How are you currently storing your data?
> > >
> > > On Fri, Jul 14, 2017 at 6:07 AM Udbhav Agarwal
> > > <udbhav.agarwal@syncoms.com <mailto:udbhav.agarwal@syncoms.com>>
> wrote:
> > > Additionally, can you please provide me some links which can guide
> > > me to setup up such system with volumes ? Thank you.
> > >
> > > Udbhav
> > > -----Original Message-----
> > > From: Udbhav Agarwal [mailto:udbhav.agarwal@syncoms.com<mailto:
> > > udbhav.agarwal@syncoms.com>]
> > > Sent: Friday, July 14, 2017 6:31 PM
> > > To: user@hbase.apache.org<mailto:user@hbase.apache.org>
> > > Cc: dimaspivak@apache.org<mailto:dimaspivak@apache.org>
> > > Subject: RE: Hbase on docker container with persistent storage
> > >
> > > Thank you Dima for the response.
> > >         Let me reiterate what I want to achieve in my case. I am
> > > using hbase to persist my bigdata(Terabytes and petabytes) coming
> > > from various sources through spark streaming and kafka.  Spark
> > > streaming and kafka are running as separate microservices inside
> > > different and
> > excusive containers.
> > > These containers are communicating with http service protocol.
> > > Currently I am using hbase setup on 4 VMs on a single host machine.
> > > I have a microservice inside a container to connect to this hbase.
> > > This whole setup is functional and I am able to persist data into as
> > > well as get data from hbase into spark streaming. My use case is of
> > > real time ingestion into hbase as well as real time query from hbase.
> > >         Now I am planning to deploy hbase itself inside container. I
> > > want to know what are the options for this. In how many possible
> > > ways I can achieve this ? If I use volumes of container, will they
> > > be able to hold such amount of data (TBs & PBs) ? How will I setup
> > > up hdfs
> > inside volumes ?
> > > how can I use the power of distributed file system there? Is this
> > > the best way ?
> > >
> > >
> > > Thanks,
> > > Udbhav
> > > -----Original Message-----
> > > From: Dima Spivak [mailto:dimaspivak@apache.org<mailto:
> > > dimaspivak@apache.org>]
> > > Sent: Friday, July 14, 2017 3:44 AM
> > > To: hbase-user <user@hbase.apache.org<mailto:user@hbase.apache.org>>
> > > Subject: Re: Hbase on docker container with persistent storage
> > >
> > > Udbhav,
> > >
> > > Volumes are Docker's way of having folders or files from the host
> > > machine bypass the union filesystem used within a Docker container.
> > > As such, if a container with a volume is killed, the data from that
> > > volume should remain there. That said, if whatever caused the
> > > container to die affects the filesystem within the container, it
> > > would
> > also affect the data on the host.
> > >
> > > Running HBase in the manner you've described is not typical in
> > > anything resembling a production environment, but if you explain
> > > more about your use case, we could provide more advice. That said,
> > > how you'd handle data locality and, in particular, multi-host
> > > deployments of HBase in this manner is more of a concern for me than
> > > volume data corruption. What kind of scale do you need to support?
> > > What kind of
> > performance do you expect?
> > >
> > > -Dima
> > >
> > > On Thu, Jul 13, 2017 at 12:18 AM, Samir Ahmic <ahmic.samir@gmail.com
> > > <mailto:ahmic.samir@gmail.com>> wrote:
> > >
> > > > Hi Udbhav,
> > > > Great work on hbase docker deployment was done in
> > > > https://issues.apache.org/jira/browse/HBASE-12721 you may start
> > > > your journey from there.  As for rest of your questions maybe
> > > > there are some folks here that were doing similar testing and may
> > > > give you more
> > > info.
> > > >
> > > > Regards
> > > > Samir
> > > >
> > > > On Thu, Jul 13, 2017 at 7:57 AM, Udbhav Agarwal <
> > > > udbhav.agarwal@syncoms.com<mailto:udbhav.agarwal@syncoms.com>>
> > > > wrote:
> > > >
> > > > > Hi All,
> > > > > I need to run hbase 0.98 backed by hdfs on docker container and
> > > > > want to stop the data lost if the container restarts.
> > > > >                As per my understanding of docker containers,
> > > > > they work in a way that if any of the container is
> > > > > stopped/killed , every information related to it gets killed. It
> > > > > implies if I am running hbase in a
> > > > container
> > > > > and I have stored some data in some tables and consequently if
> > > > > the container is stopped then the data will be lost. I need a
> > > > > way in which I can stop this data loss.
> > > > >                I have gone through concept of volume in docker.
> > > > > Is it possible to stop this data loss with this approach? What
> > > > > if volume gets corrupted? Is there any instance of volume
> > > > > running there which can be stopped and can cause data loss ?
> > > > >                Is there a possibility that I can use hdfs
> > > > > running at some external host outside the docker and my hbase
> > > > > running inside docker ? Is such scenario possible ? If yes, How ?
> > > > >                Thank you in advance.
> > > > >
> > > > >
> > > > > Thanks,
> > > > > Udbhav Agarwal
> > > > >
> > > > >
> > > >
> > > --
> > > -Dima
> > >
> > --
> > -Dima
> >
> --
> -Dima
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message