From dev-return-952-archive-asf-public=cust-asf.ponee.io@fluo.apache.org Fri Dec 20 05:43:56 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 4CB6818064C for ; Fri, 20 Dec 2019 06:43:56 +0100 (CET) Received: (qmail 19349 invoked by uid 500); 20 Dec 2019 05:43:55 -0000 Mailing-List: contact dev-help@fluo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@fluo.apache.org Delivered-To: mailing list dev@fluo.apache.org Received: (qmail 19337 invoked by uid 99); 20 Dec 2019 05:43:55 -0000 Received: from Unknown (HELO mailrelay1-lw-us.apache.org) (10.10.3.159) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 20 Dec 2019 05:43:55 +0000 Received: from mail-qk1-f172.google.com (mail-qk1-f172.google.com [209.85.222.172]) by mailrelay1-lw-us.apache.org (ASF Mail Server at mailrelay1-lw-us.apache.org) with ESMTPSA id 0D58C10FC for ; Fri, 20 Dec 2019 05:43:55 +0000 (UTC) Received: by mail-qk1-f172.google.com with SMTP id x1so6677973qkl.12 for ; Thu, 19 Dec 2019 21:43:55 -0800 (PST) X-Gm-Message-State: APjAAAUGFYsiznP5TrJC71ab6AED/NUZyfeYtSJC+0JVgYnOqG0xdrR0 KqAmkUUKOEzM+BJUe1cKt06hB79rwBxPoMJ8LEE= X-Google-Smtp-Source: APXvYqwWOGVNzR1nY2tWvmFwH66eRuP43fz4TK9gJUUt4CHlmjPmDx0KkBTgdwykAoTACEHqDTYTcIpoBdCSQQ5HEzU= X-Received: by 2002:a37:308:: with SMTP id 8mr11335427qkd.98.1576820634625; Thu, 19 Dec 2019 21:43:54 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Christopher Date: Fri, 20 Dec 2019 00:43:38 -0500 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Run Accumulo and Hadoop services under systemd To: fluo-dev Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, Dec 19, 2019 at 9:57 PM Aishwarya Thangappa wrote: > > Thanks, Christopher. I see your point. The changes to the accumulo-cluste= r scripts aside, > > 1. Is there a value in landing the systemd changes in muchos repo? If it = is deemed valuable, we can put up a PR with the systemd units as template f= iles and ansible tasks to copy these to the cluster nodes and enable/start = them. This will be easy for us to upstream as we already have the work done= . There is probably some value in that, assuming the use cases Keith mentioned aren't made more difficult. But, the details of the changes might matter. > > 2. Alternatively would you find value if we re-worked a set of shell scri= pts which would do the equivalent of above changes and have a PR opened aga= inst the Accumulo repo? That would very much depend on the details, but I am wary of adding downstream integration tooling directly into Accumulo's main repository, even if it had significant added value, rather than have such tooling live along side it separately in its own repo (possibly as another repo maintained by the Accumulo PMC, or by a community member). This is because the Accumulo PMC cannot possibly maintain everything of value that is marginally related to Accumulo under its own umbrella. I've seen projects try to do that, and it doesn't go well. > 2.1 . In this case, would reference scripts to do the start/stop operat= ions using systemd similar to that of accumulo-cluster scripts be of value? Perhaps yes, but probably not maintained in Accumulo's main repo. However, I think it would make a good blog post on Accumulo's website, either way. > 2.2 . We found that it was necessary to do minor changes to accumulo-se= rvice script to support the multiple tserver case. Is there any concerns on= modifying it? There's a lot to say about accumulo-service, so I'll try to be brief. In short, I don't think accumulo-service (and accumulo-cluster) should be used for for systemd integration. Work was done in bin/accumulo in 2.0 to more easily support downstream integration by dramatically simplifying its implementation. This allowed accumulo-cluster/accumulo-service to be easily created as one such set of "downstream" tools that built off of the simplicity of the new bin/accumulo, and which was provided within the main repo as convenient out-of-the-box cluster management / service management tools for when we build the binary tarball. However, they were not intended as integration points for downstream tools... bin/accumulo was. As for accumulo-service: 1. accumulo-service uses old SysV init patterns for managing services, none of which are needed under systemd 2. it does PIDfile stuff that is unnecessary to do at all with systemd (assuming Type=3Dsimple, which is what you should probably use, since you don't need to background it, not Type=3Dforking; and even if you did use forking, systemd has its own way of managing PIDfiles) 3. it does custom, manual log file rotation stuff, which we probably should never have had in there at all, but definitely isn't needed with systemd/journald 4. supporting multiple tservers is so much simpler with unit files using systemd instances (parameter injection in unit file templates) 5. accumulo-service should really only be used by accumulo-cluster, or perhaps as part of a suite of legacy SysV init scripts accumulo-cluster and accumulo-service go together, and were written with a specific use case in mind. Systemd integration is an altogether different use case, and I think a much simpler set of tooling could be built using systemd and bin/accumulo than it could by trying to use accumulo-service in a way it wasn't intended to be used (but bin/accumulo was). > > And, not sure why you are getting a 404 on the gist files. I am able to a= ccess them from a private browser window without issues. Sorry, I figured this out. The href got mangled in the HTML version of the email. > > On 2019/12/18 01:54:00, Christopher wrote: > > On Tue, Dec 17, 2019 at 8:07 PM Aishwarya Thangappa > > wrote: > > > > > > Sorry, I wasn't aware that attachments are not allowed in ASF Mailing= lists. I have now created them as gists. Please have a look. > > > > > > master systemd unit: https://gist.github.com/ata18/e8f7577c99cd08ba4= 6544aacef26969f > > > accumulo-service: https://gist.github.com/ata18/48014ea78b09e4febb884= 80ea48ed62c > > > > These first two links don't work for me. I get a 404 error message. > > > > For reference, here's the basic unit files I wrote for Accumulo from > > Fedora 29: https://src.fedoraproject.org/rpms/accumulo/tree/f29 > > They used a /usr/bin/accumulo script generated using the > > %jpackage_script macro (see accumulo.spec file for that) which worked > > a lot like Accumulo 2.0's bin/accumulo file works (not a coincidence, > > since the 2.0 script was written with insight gained from the attempt > > to package in Fedora). > > > > > accumulo-cluster: https://gist.github.com/ata18/234c2e63d2718aec65bd2= 037ec3125cd > > > > This appears to be based on an older version of our accumulo-cluster > > script (from 2.0?) rather than the current one in the master branch, > > but I think I got the sense of what was changed by glancing at the > > diff. Once you have systemd, I'm not convinced it's beneficial to use > > something like accumulo-cluster anymore, as it doesn't really serve > > any added value beyond what you would get with using systemctl via > > pssh or pdsh and a hostsfile. The accumulo-cluster script's purpose is > > for when you don't have an existing service management tool for the > > cluster, and its intent is to be very basic, to support the "deploy > > out of tarball" use case, with no other vendor or downstream > > packaging. Modifying it to wrap systemd seems a bit unnecessarily > > complex to me, since I don't think you need it when using systemd. > > > > It might be better to create a simpler script that makes it easy to > > run specific tasks using pdsh or pssh, a hostsfile, to be used when > > using systemd, rather than trying to put those features into the > > accumulo-cluster script. > > > > > > > > Thanks, > > > Aishwarya > > > > > > On 2019/12/15 16:16:56, Michael Wall wrote: > > > > Hi Aishwarya, > > > > > > > > I didn't get any attachments on this. > > > > > > > > Thanks > > > > > > > > Mike > > > > > > > > On Fri, Dec 13, 2019 at 5:46 PM Aishwarya Thangappa > > > > wrote: > > > > > > > > > Hello everyone, > > > > > > > > > > I have not subscribed to the dev mailing list earlier and missed = on some > > > > > of your questions. I will address them here. > > > > > > > > > > @Christopher > > > > > Most of the changes except the actual installation of the systemd= units > > > > > could possibly go into Accumulo. These would be the systemd units= for > > > > > various accumulo services, modification to cluster-wide scripts i= n accumulo > > > > > to use systemd instead of directly starting/stopping the processe= s. We > > > > > would be happy to accommodate/answer any suggestions or follow-up= questions > > > > > you may have. > > > > > > > > > > Attached the accumulo_cluster and accumulo_service scripts with s= ystemd > > > > > changes. > > > > > > > > > > > > > > > @Keith Turner > > > > > Once we determine where the different pieces should land, I can p= ost PRs > > > > > accordingly. In our current setup, in muchos.properties file I ha= ve added a > > > > > `use_systemd` flag which when set to true, will overwrite the acc= umulo > > > > > cluster-wide scripts in the nodes with the attached ones. These f= iles > > > > > currently reside at ansible/roles/accumulo/files. If we determine= that > > > > > these scripts and the systemd unit files will instead go to Accum= ulo > > > > > project, I will have to make changes accordingly. > > > > > > > > > > @Michael Wall > > > > > Systemd units internally call the same scripts that accumulo_clus= ter > > > > > commands currently use. The change is that accumulo_cluster comma= nds would > > > > > call systemd start/stop which inturn would call accumulo_service = commands. > > > > > Attached a sample systemd_unit template. Can you please elaborate= if this > > > > > is still an issue? > > > > > > > > > > ------------------------------ > > > > > *From:* Aishwarya Thangappa > > > > > *Sent:* Thursday, December 12, 2019 11:25 AM > > > > > *To:* dev@fluo.apache.org > > > > > *Cc:* Arvind Shyamsundar ; Billie Rinaldi= < > > > > > Billie.Rinaldi@microsoft.com> > > > > > *Subject:* Run Accumulo and Hadoop services under systemd > > > > > > > > > > Hi everyone, > > > > > > > > > > While using fluo-muchos to deploy an Accumulo cluster, we recogni= zed the > > > > > need for various Accumulo and Hadoop services to be run under a s= ervice > > > > > manager like systemd which will ensure that all these services ar= e brought > > > > > up correctly in the event of VM / OS reboots / cold starts. We ha= ve made > > > > > the required changes for this and would like to contribute it bac= k to the > > > > > community if there is any interest around it. > > > > > > > > > > Summarizing what we have done: > > > > > > > > > > - Crafted separate systemd unit files for Accumulo (master, mo= nitor, > > > > > gc, traser, tserver), Hadoop (journalnode, namenode, datanode, > > > > > resourcemanager, nodemanager, zkfc) and Zookeeper services. > > > > > - All of these unit files will be copied to the respective nod= es' > > > > > /etc/systemd/system directory; the services will then be start= ed and > > > > > enabled by ansible systemd module. > > > > > - In case of num_tservers > 1, multiple tserver systemd units = will be > > > > > copied to the node and each will be independently managed. > > > > > - Also made necessary changes to the existing cluster-wide scr= ipts > > > > > including accumulo_cluster, accumulo_service, start_dfs, start= _yarn etc., > > > > > to have them work seamlessly with sytemd. > > > > > > > > > > Is there an appetite to look at the details? If so, we can post a= PR or if > > > > > there are any feedbacks and other considerations, please let us k= now and we > > > > > can discuss them. > > > > > > > > > > Thanks, > > > > > Aishwarya > > > > > > > > > > > > > > > >