fluo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher <ctubb...@apache.org>
Subject Re: Run Accumulo and Hadoop services under systemd
Date Fri, 20 Dec 2019 05:43:38 GMT
On Thu, Dec 19, 2019 at 9:57 PM Aishwarya Thangappa
<aishwarya.thangappa@gmail.com> wrote:
>
> Thanks, Christopher. I see your point. The changes to the accumulo-cluster scripts aside,
>
> 1. Is there a value in landing the systemd changes in muchos repo? If it is deemed valuable,
we can put up a PR with the systemd units as template files and ansible tasks to copy these
to the cluster nodes and enable/start them. This will be easy for us to upstream as we already
have the work done.

There is probably some value in that, assuming the use cases Keith
mentioned aren't made more difficult. But, the details of the changes
might matter.

>
> 2. Alternatively would you find value if we re-worked a set of shell scripts which would
do the equivalent of above changes and have a PR opened against the Accumulo repo?

That would very much depend on the details, but I am wary of adding
downstream integration tooling directly into Accumulo's main
repository, even if it had significant added value, rather than have
such tooling live along side it separately in its own repo (possibly
as another repo maintained by the Accumulo PMC, or by a community
member). This is because the Accumulo PMC cannot possibly maintain
everything of value that is marginally related to Accumulo under its
own umbrella. I've seen projects try to do that, and it doesn't go
well.

>   2.1 . In this case, would reference scripts to do the start/stop operations using systemd
similar to that of accumulo-cluster scripts be of value?

Perhaps yes, but probably not maintained in Accumulo's main repo.
However, I think it would make a good blog post on Accumulo's website,
either way.

>   2.2 . We found that it was necessary to do minor changes to accumulo-service script
to support the multiple tserver case. Is there any concerns on modifying it?

There's a lot to say about accumulo-service, so I'll try to be brief.
In short, I don't think accumulo-service (and accumulo-cluster) should
be used for for systemd integration. Work was done in bin/accumulo in
2.0 to more easily support downstream integration by dramatically
simplifying its implementation. This allowed
accumulo-cluster/accumulo-service to be easily created as one such set
of "downstream" tools that built off of the simplicity of the new
bin/accumulo, and which was provided within the main repo as
convenient out-of-the-box cluster management / service management
tools for when we build the binary tarball. However, they were not
intended as integration points for downstream tools... bin/accumulo
was.

As for accumulo-service:

1. accumulo-service uses old SysV init patterns for managing services,
none of which are needed under systemd
2. it does PIDfile stuff that is unnecessary to do at all with systemd
(assuming Type=simple, which is what you should probably use, since
you don't need to background it, not Type=forking; and even if you did
use forking, systemd has its own way of managing PIDfiles)
3. it does custom, manual log file rotation stuff, which we probably
should never have had in there at all, but definitely isn't needed
with systemd/journald
4. supporting multiple tservers is so much simpler with unit files
using systemd instances (parameter injection in unit file templates)
5. accumulo-service should really only be used by accumulo-cluster, or
perhaps as part of a suite of legacy SysV init scripts

accumulo-cluster and accumulo-service go together, and were written
with a specific use case in mind. Systemd integration is an altogether
different use case, and I think a much simpler set of tooling could be
built using systemd and bin/accumulo than it could by trying to use
accumulo-service in a way it wasn't intended to be used (but
bin/accumulo was).

>
> And, not sure why you are getting a 404 on the gist files. I am able to access them from
a private browser window without issues.

Sorry, I figured this out. The href got mangled in the HTML version of
the email.

>
> On 2019/12/18 01:54:00, Christopher <ctubbsii@apache.org> wrote:
> > On Tue, Dec 17, 2019 at 8:07 PM Aishwarya Thangappa
> > <aishwarya.thangappa@gmail.com> wrote:
> > >
> > > Sorry, I wasn't aware that attachments are not allowed in ASF Mailing lists.
I have  now created them as gists. Please have a look.
> > >
> > > master systemd unit:  https://gist.github.com/ata18/e8f7577c99cd08ba46544aacef26969f
> > > accumulo-service: https://gist.github.com/ata18/48014ea78b09e4febb88480ea48ed62c
> >
> > These first two links don't work for me. I get a 404 error message.
> >
> > For reference, here's the basic unit files I wrote for Accumulo from
> > Fedora 29: https://src.fedoraproject.org/rpms/accumulo/tree/f29
> > They used a /usr/bin/accumulo script generated using the
> > %jpackage_script macro (see accumulo.spec file for that) which worked
> > a lot like Accumulo 2.0's bin/accumulo file works (not a coincidence,
> > since the 2.0 script was written with insight gained from the attempt
> > to package in Fedora).
> >
> > > accumulo-cluster: https://gist.github.com/ata18/234c2e63d2718aec65bd2037ec3125cd
> >
> > This appears to be based on an older version of our accumulo-cluster
> > script (from 2.0?) rather than the current one in the master branch,
> > but I think I got the sense of what was changed by glancing at the
> > diff. Once you have systemd, I'm not convinced it's beneficial to use
> > something like accumulo-cluster anymore, as it doesn't really serve
> > any added value beyond what you would get with using systemctl via
> > pssh or pdsh and a hostsfile. The accumulo-cluster script's purpose is
> > for when you don't have an existing service management tool for the
> > cluster, and its intent is to be very basic, to support the "deploy
> > out of tarball" use case, with no other vendor or downstream
> > packaging. Modifying it to wrap systemd seems a bit unnecessarily
> > complex to me, since I don't think you need it when using systemd.
> >
> > It might be better to create a simpler script that makes it easy to
> > run specific tasks using pdsh or pssh, a hostsfile, to be used when
> > using systemd, rather than trying to put those features into the
> > accumulo-cluster script.
> >
> > >
> > > Thanks,
> > > Aishwarya
> > >
> > > On 2019/12/15 16:16:56, Michael Wall <mjwall@gmail.com> wrote:
> > > > Hi Aishwarya,
> > > >
> > > > I didn't get any attachments on this.
> > > >
> > > > Thanks
> > > >
> > > > Mike
> > > >
> > > > On Fri, Dec 13, 2019 at 5:46 PM Aishwarya Thangappa
> > > > <Aishwarya.Thangappa@microsoft.com.invalid> wrote:
> > > >
> > > > > Hello everyone,
> > > > >
> > > > > I have not subscribed to the dev mailing list earlier and missed
on some
> > > > > of your questions. I will address them here.
> > > > >
> > > > > @Christopher
> > > > > Most of the changes except the actual installation of the systemd
units
> > > > > could possibly go into Accumulo. These would be the systemd units
for
> > > > > various accumulo services, modification to cluster-wide scripts in
accumulo
> > > > > to use systemd instead of directly starting/stopping the processes.
We
> > > > > would be happy to accommodate/answer any suggestions or follow-up
questions
> > > > > you may have.
> > > > >
> > > > > Attached the accumulo_cluster and accumulo_service scripts with systemd
> > > > > changes.
> > > > >
> > > > >
> > > > > @Keith Turner
> > > > > Once we determine where the different pieces should land, I can post
PRs
> > > > > accordingly. In our current setup, in muchos.properties file I have
added a
> > > > > `use_systemd` flag which when set to true, will overwrite the accumulo
> > > > > cluster-wide scripts in the nodes with the attached ones. These files
> > > > > currently reside at ansible/roles/accumulo/files. If we determine
that
> > > > > these scripts and the systemd unit files will instead go to Accumulo
> > > > > project, I will have to make changes accordingly.
> > > > >
> > > > > @Michael Wall
> > > > > Systemd units internally call the same scripts that accumulo_cluster
> > > > > commands currently use. The change is that accumulo_cluster commands
would
> > > > > call systemd start/stop which inturn would call accumulo_service
commands.
> > > > > Attached a sample systemd_unit template. Can you please elaborate
if this
> > > > > is still an issue?
> > > > >
> > > > > ------------------------------
> > > > > *From:* Aishwarya Thangappa
> > > > > *Sent:* Thursday, December 12, 2019 11:25 AM
> > > > > *To:* dev@fluo.apache.org <dev@fluo.apache.org>
> > > > > *Cc:* Arvind Shyamsundar <arvindsh@microsoft.com>; Billie Rinaldi
<
> > > > > Billie.Rinaldi@microsoft.com>
> > > > > *Subject:* Run Accumulo and Hadoop services under systemd
> > > > >
> > > > > Hi everyone,
> > > > >
> > > > > While using fluo-muchos to deploy an Accumulo cluster, we recognized
the
> > > > > need for various Accumulo and Hadoop services to be run under a service
> > > > > manager like systemd which will ensure that all these services are
brought
> > > > > up correctly in the event of VM / OS reboots / cold starts. We have
made
> > > > > the required changes for this and would like to contribute it back
to the
> > > > > community if there is any interest around it.
> > > > >
> > > > > Summarizing what we have done:
> > > > >
> > > > >    - Crafted separate systemd unit files for Accumulo (master, monitor,
> > > > >    gc, traser, tserver), Hadoop (journalnode, namenode, datanode,
> > > > >    resourcemanager, nodemanager, zkfc) and Zookeeper services.
> > > > >    - All of these unit files will be copied to the respective nodes'
> > > > >    /etc/systemd/system directory; the services will then be started
and
> > > > >    enabled by ansible systemd module.
> > > > >    - In case of num_tservers > 1, multiple tserver systemd units
will be
> > > > >    copied to the node and each will be independently managed.
> > > > >    - Also made necessary changes to the existing cluster-wide scripts
> > > > >    including accumulo_cluster, accumulo_service, start_dfs, start_yarn
etc.,
> > > > >    to have them work seamlessly with sytemd.
> > > > >
> > > > > Is there an appetite to look at the details? If so, we can post a
PR or if
> > > > > there are any feedbacks and other considerations, please let us know
and we
> > > > > can discuss them.
> > > > >
> > > > > Thanks,
> > > > > Aishwarya
> > > > >
> > > > >
> > > >
> >

Mime
View raw message