atlas-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jon Maron <jma...@hortonworks.com>
Subject Re: Hive Hook property atlas.cluster.name
Date Sat, 29 Aug 2015 00:09:44 GMT



> On Aug 28, 2015, at 7:58 PM, Arpit Gupta <arpit@hortonworks.com> wrote:
> 
> It was my understanding that the same instance of atlas will be used with multiple hadoop
clusters. Hence we had to name space the hive tables. Thus we cant really set cluster name
in atlas properties.
> 
> Simplest solution might be to expose the cluster name config in the atlas configurations
with a default.

I'm sorry - you seem to be contradicting your first point?  Is there a difference between
atlas properties and atlas configurations?


> Add a help section there saying if integrating with an existing atlas cluster make sure
it does not have a cluster defined with the name in the config. 

Are you proposing the same solution but with the caveat that we don't use the ambari cluster
name as the hive namespace for the cluster?

> 
> That way we do not have to use the ambari cluster name as the default. 
> --
> Arpit Gupta
> Hortonworks Inc.
> http://hortonworks.com/
> 
>> On Aug 28, 2015, at 4:53 PM, Jon Maron <jmaron@hortonworks.com> wrote:
>> 
>> 
>> 
>>> On Aug 28, 2015, at 7:25 PM, Arpit Gupta <arpit@hortonworks.com> wrote:
>>> 
>>> We have to provide the cluster name to the end user so they can query the hive
tables per cluster. So if we generate a GUID then all the queries will have to refer to the
GUID which might not be user friendly.
>> 
>> My ideas are being shot down, but no one seems to be offering solutions ;)
>> 
>> Is it viable to:
>> 
>> - have cluster id/name as an atlas property (application.properties)
>> - I may be able to set that value in ambari to be the designated ambari cluster name,
if the configuration lifecycle will allow it to be set prior to the execution of the stack
advisor. 
>> 
>>> 
>>> --
>>> Arpit Gupta
>>> Hortonworks Inc.
>>> http://hortonworks.com/
>>> 
>>>> On Aug 28, 2015, at 4:18 PM, Jon Maron <jmaron@hortonworks.com> wrote:
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> On Aug 28, 2015, at 7:09 PM, Seetharam Venkatesh <venkatesh@innerzeal.com>
wrote:
>>>>> 
>>>>> We use this as a namespace to avoid collisions from a DR site pushing
the
>>>>> same set of metadata elements into Atlas.
>>>>> 
>>>>> If Ambari supported multiple clusters, we could have leaned on it. In
the
>>>>> absence of such a feature, we are left with this kludge. We may not be
able
>>>>> to default to a static string since it can apply to a DR site as well.
>>>> 
>>>> In that instance it seems a generated GUID per cluster would suffice?
>>>> 
>>>>> 
>>>>> Thanks!
>>>>> 
>>>>>> On Fri, Aug 28, 2015 at 10:45 AM Jon Maron <jmaron@hortonworks.com>
wrote:
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> Can someone provide more insight into the ‘atlas.cluster.name’
property
>>>>>> and its usage?  It turns out that leveraging the Ambari cluster name
could
>>>>>> be rather problematic since it can be changed and that change is
not
>>>>>> necessarily conveyed to the cluster elements.  I am attempting to
modify
>>>>>> the Ambari service code to leverage the stack advisor to affect the
hive
>>>>>> changes required for configuring the Atlas hive hook, and in that
instance
>>>>>> the Ambari cluster name isn’t even currently available.
>>>>>> 
>>>>>> Is there another name envisioned for use in this scenario (other
than
>>>>>> Ambari cluster name)?  Can a default value be leveraged for a given
cluster
>>>>>> (‘primary’)?
>>>>>> 
>>>>>> — Jon
> 
> 

Mime
View raw message