asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raman Grover <ramangrove...@gmail.com>
Subject Re: Cluster XML files
Date Thu, 30 Jun 2016 18:23:09 GMT
Hi

I understand the challenge here and its tricky :)

We have two dimensions to slice our configs along.
*a) physical config v/s application config*
*b) mutability via managix alter command. (these can be further split as
those requiring restarts or not)  *

I think it is simpler and cleaner to slice it along dimension (b). The
reasons are as follows.

(i) Config params that needs to be user-defined but just once (e.g. page
size) sit nicely in cluster.xml and follow the immutable yet configurable
policy instantly.

(ii) Config params in cluster.xml are validated prior to use (mamagix
validate). Performance-sensitive properties like page size can be validated
here to avoid mistakes.

(iii) We do not need any additional logic in alter command to filter
immutable configs.

(iv) Config params such as ports, directories, and ip addresses are not
typically modified across restarts. Using (b), these remain in cluster.xml.
One may argue for ports being mutable, but if a port is conflicting, it is
so right from the beginning and should be caught as part of cluster
validation. A non-conflicting port should not require any modifications
thereafter. So it is immutable for practical reasons.

Next, there are configs in asterix-configuration.xml that can be
theoretically changed without having to restart. Currently we do not have a
way to propagate values without restarting (but this can be done easily).
So lets assume we did that. This would cause NC opts to move to cluster.xml
(which is not ugly but actually makes sense).

*So the split is as follows*

a) Cluster.xml: Configurable yet immutable configs
                      params: ip addresses, directories, ports, jvm
options, page size etc.

b) AsterixConfiguration. Mutable  (current limitation requires restarts but
we would fix it).
                      params: all others.

We need to find better names for (a) and (b) though to better reflect this
split.
Thoughts?

Regards,
Raman

On Wed, Jun 29, 2016 at 12:56 PM, Till Westmann <tillw@apache.org> wrote:

> Hi Raman,
>
> thanks for chiming in. The separation of the physical configuration from
> the
> software configuration indeed looks good.
> However, I’m a little challenged by the current split. If the physical
> configuration is in 1 file, it seems that it should contain all network and
> storage settings. However, we also have some network settings (e.g.
> "web.port") and storage-like settings (at least "compiler.pregelix.home"
> refers to a directory ..) in asterix-configuration.xml.
> Should those move to the cluster.xml? Or should those be where they are?
>
> Also, I'm wondering if there's a difference in the lifecycle of the
> parameter settings? Are all the parameters in cluster.xml fixed "forever"?
> Or could some of them be modified between restarts (e.g. it seems feasible
> to change ports between restarts, while changing storage directories will
> probably break the cluster).
> Also it seems that some of the parameters in asterix-configuration.xml can
> only be changed between restarts (e.g. "nc.java.opts"), while changing
> others would break the cluster (e.g. "storage.buffercache.pagesize"), and
> others could theoretically be changed in a running cluster or even per job
> (e.g. "compiler.sortmemory").
> Would it maybe make sense to split configurations along those lines? Or
> should we just put all configurations in one file and leave it up the the
> user to make sense of the lifecycles?
>
> I'm really not sure if there's a "right" way to organize these and - if so
> -
> what it is.
>
> Cheers,
> Till
>
> On 29 Jun 2016, at 9:40, Raman Grover wrote:
>
> > Hi,
> >
> > It was natural to define your physical clusters separately from the
> > properties of the Asterix instance(s) that run over the hardware.
> >
> > As such, the cluster xml mapped to the clusters we had - sensorium,
> > asterix, or the yahoo cluster we once had access to. A single cluster xml
> > could be reused by multiple devs wishing to use a part (by commenting out
> > sections in the xml)  or the complete cluster to launch their instances.
> > Properties related to the cluster do not change often e.g. the IP
> addresses
> > etc and so these need not be repeated and redefined for each asterix
> > instance.
> >
> > Asterix configuration xml was meant to contain tuning parameters specific
> > to an asterix instance.
> >
> > So the user model was to have a fixed set of cluster xmls and a set of
> > asterix configuration files, maintained by different users, each
> > representing different runtine tuning parameters that devs would have
> > different values for or would frequently change as per the workload or
> > experiments they are running.
> >
> > separation of concerns and avoiding repetition of properties (when
> defining
> > multiple instances over the same hardware)  were the main reasons for
> > having two separate files.
> >
> > Regards,
> > Raman
> > On Jun 29, 2016 8:36 AM, "Till Westmann" <tillw@apache.org> wrote:
> >
> >> Is there a conceptual or lifecycle reason to put a parameter in one or
> the
> >> other file? I really would like to understand why we have 2 files and
> what
> >> the difference is. I think that one hint might be what Ian just
> mentioned,
> >> that the parameters in asterix-configuration.xml can be modified (with a
> >> restart?) and the other ones cannot. Is that right?
> >>
> >> On 29 Jun 2016, at 7:56, Ian Maxon wrote:
> >>
> >>> Managix sort of splices the cluster.xml with the existing
> >>> asterix-configuration.xml to produce a new asterix-configuration.xml
> that
> >>> then gets put into the asterix-app jar inside of asterix-server. The
> user
> >>> has to know about the base asterix-configuration.xml because that is
> >> where
> >>> you change some important memory parameters. You can also edit it
> without
> >>> deleting the cluster itself (managix alter).
> >>>
> >>> On Wed, Jun 29, 2016 at 1:05 AM, Chris Hillery <chillery@hillery.land>
> >>> wrote:
> >>>
> >>>> My understanding of how Managix-based deployment currently works is
as
> >>>> follows:
> >>>>
> >>>>   - User composes a cluster.xml
> >>>>
> >>>>   - Managix consumes this and produces an asterix-configuration.xml,
> >> which
> >>>> contains some of the same data as cluster.xml as well as some things
> >>>> derived from that data (such as composing the <iodevices> directories
> >> with
> >>>> the <store> subdirectory name to produce <storeDirs>)
> >>>>
> >>>>   - Managix places both the original cluster.xml and the generated
> >>>> asterix-configuration.xml onto the CLASSPATH of the NCs and CCs
> >>>>
> >>>>   - The user is never directly aware of asterix-configuration.xml, and
> >>>> certainly does not edit it in the normal course of operation
> >>>>
> >>>> Is this an accurate summary?
> >>>>
> >>>> Ceej
> >>>> aka Chris Hillery
> >>>>
> >>
>



-- 
Raman

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message