asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Till Westmann" <>
Subject Re: Cluster XML files
Date Wed, 29 Jun 2016 19:56:15 GMT
Hi Raman,

thanks for chiming in. The separation of the physical configuration from the
software configuration indeed looks good.
However, I’m a little challenged by the current split. If the physical
configuration is in 1 file, it seems that it should contain all network and
storage settings. However, we also have some network settings (e.g.
"web.port") and storage-like settings (at least "compiler.pregelix.home"
refers to a directory ..) in asterix-configuration.xml.
Should those move to the cluster.xml? Or should those be where they are?

Also, I'm wondering if there's a difference in the lifecycle of the
parameter settings? Are all the parameters in cluster.xml fixed "forever"?
Or could some of them be modified between restarts (e.g. it seems feasible
to change ports between restarts, while changing storage directories will
probably break the cluster).
Also it seems that some of the parameters in asterix-configuration.xml can
only be changed between restarts (e.g. ""), while changing
others would break the cluster (e.g. "storage.buffercache.pagesize"), and
others could theoretically be changed in a running cluster or even per job
(e.g. "compiler.sortmemory").
Would it maybe make sense to split configurations along those lines? Or
should we just put all configurations in one file and leave it up the the
user to make sense of the lifecycles?

I'm really not sure if there's a "right" way to organize these and - if so -
what it is.


On 29 Jun 2016, at 9:40, Raman Grover wrote:

> Hi,
> It was natural to define your physical clusters separately from the
> properties of the Asterix instance(s) that run over the hardware.
> As such, the cluster xml mapped to the clusters we had - sensorium,
> asterix, or the yahoo cluster we once had access to. A single cluster xml
> could be reused by multiple devs wishing to use a part (by commenting out
> sections in the xml)  or the complete cluster to launch their instances.
> Properties related to the cluster do not change often e.g. the IP addresses
> etc and so these need not be repeated and redefined for each asterix
> instance.
> Asterix configuration xml was meant to contain tuning parameters specific
> to an asterix instance.
> So the user model was to have a fixed set of cluster xmls and a set of
> asterix configuration files, maintained by different users, each
> representing different runtine tuning parameters that devs would have
> different values for or would frequently change as per the workload or
> experiments they are running.
> separation of concerns and avoiding repetition of properties (when defining
> multiple instances over the same hardware)  were the main reasons for
> having two separate files.
> Regards,
> Raman
> On Jun 29, 2016 8:36 AM, "Till Westmann" <> wrote:
>> Is there a conceptual or lifecycle reason to put a parameter in one or the
>> other file? I really would like to understand why we have 2 files and what
>> the difference is. I think that one hint might be what Ian just mentioned,
>> that the parameters in asterix-configuration.xml can be modified (with a
>> restart?) and the other ones cannot. Is that right?
>> On 29 Jun 2016, at 7:56, Ian Maxon wrote:
>>> Managix sort of splices the cluster.xml with the existing
>>> asterix-configuration.xml to produce a new asterix-configuration.xml that
>>> then gets put into the asterix-app jar inside of asterix-server. The user
>>> has to know about the base asterix-configuration.xml because that is
>> where
>>> you change some important memory parameters. You can also edit it without
>>> deleting the cluster itself (managix alter).
>>> On Wed, Jun 29, 2016 at 1:05 AM, Chris Hillery <>
>>> wrote:
>>>> My understanding of how Managix-based deployment currently works is as
>>>> follows:
>>>>   - User composes a cluster.xml
>>>>   - Managix consumes this and produces an asterix-configuration.xml,
>> which
>>>> contains some of the same data as cluster.xml as well as some things
>>>> derived from that data (such as composing the <iodevices> directories
>> with
>>>> the <store> subdirectory name to produce <storeDirs>)
>>>>   - Managix places both the original cluster.xml and the generated
>>>> asterix-configuration.xml onto the CLASSPATH of the NCs and CCs
>>>>   - The user is never directly aware of asterix-configuration.xml, and
>>>> certainly does not edit it in the normal course of operation
>>>> Is this an accurate summary?
>>>> Ceej
>>>> aka Chris Hillery

View raw message