kylin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sonny Heer <sonnyh...@gmail.com>
Subject Re: multiple EMRs sync
Date Tue, 07 Aug 2018 15:11:57 GMT
Thanks Chase.  I'm assuming the wipe-cache is the same as "Reload Metadata"
under "System" tab in kylin UI.  We did try doing reload metadata via UI
but that didn't seem to update the query node.

The other key problem is how did your team coordinate between kylin and
EMR.  that is also hardcoded properties in kylin.properties for where to
connect.  Did you bring up Kylin & EMR at the same time so therefore
bootstrap of kylin has the EMR master node ips?  Is there a 1;1 mapping of
kylin node to emr cluster?

Is there a video of that slide deck?  Also will be curious to look at your
docker image if available.  thanks



On Mon, Aug 6, 2018 at 8:37 PM Chase Zhang <chase.zhang@striking.ly> wrote:

> Hi Sonny,
>
> I'm  Chase from Strikingly. As Shaofeng has mentioned our solution, I'd
> like to have a brief introduction about it in case it will be helpful to
> you.
>
> To my understanding, the key problem of you is how to coordinate the
> master node of Kylin and its query nodes.
>
> Currently, Kylin must have a hard coded target urls at the master side for
> all query nodes and once a cube is built, master node of kylin will notify
> query nodes to update the metadata. This is because Kylin has a cache for
> related configs, although the hbase is having latest values, the cache
> might be out of date.
>
> Luckily, Kylin has provided a RESTful API for updating the cache (see
> http://kylin.apache.org/docs23/howto/howto_use_restapi.html#wipe-cache).
>
> In theory, you can manually trigger this API to make query node's metadata
> cache up to date. But if you are having multiple query instances, this will
> be come troublesome.
>
> Not like other Big Data solutions, Kylin's architecture is simple. It does
> not depends on service discovery component like Zookeeper. This makes Kylin
> easy to deploy and use, but if you're having some advanced demand, like
> auto scale, A hard coded query node's IP address and ports might not be
> good enough.
>
> As to mitigate this problem, we have developed a tool set. The basic ideas
> are:
>
> 1. Deploy Kylin with docker container
> 2. Make a separated scheduler to trigger build and monitor the status
> through RESTful API upon master nodes
> 3. Use AWS's Target Group as a service discovery solution. As query nodes
> are running inside a target group, we can use AWS's API to get all
> instance's IP address and serving ports.
> 4. Knowing a cube has been built finished as well as the entry point of
> each query nodes, the scheduler can make RESTful API to query nodes one by
> one to update their cache.
>
> Furthermore, we're now having some advanced cache management logic (like
> avoid invalidate cache while a build is failed and wait for the next build
> to recover). We embedded all these logic to our own scheduler.
>
> Hope this reply will help you.
>
> On Aug 7, 2018, 3:28 AM +0800, Sonny Heer <sonnyheer@gmail.com>, wrote:
>
> [image: Screen Shot 2018-08-06 at 10.27.35 AM.png]
>
>
> In this diagram (from slide deck).  is each HBase a different EMR
> cluster?  if so how is kylin configured to connect to both?  - notice the
> kylin query node shows a line connecting to both clusters.  Thanks for the
> input...
>
>
>
>
> On Mon, Aug 6, 2018 at 10:56 AM Sonny Heer <sonnyheer@gmail.com> wrote:
>
>> ShaoFeng,
>>
>> Is Strikingly open to sharing their work?  It appears our use case is
>> similar and would love to see what work they have matches ours.
>>
>> On Mon, Aug 6, 2018 at 7:01 AM Sonny Heer <sonnyheer@gmail.com> wrote:
>>
>>> Does that require a HA cluster & kylin installed on its own instance?
>>> EMR doesn't spin up services as HA on its master node.   I'd be curious to
>>> see what Strikingly has done and if they have it deployed on AWS.
>>>
>>>
>>>
>>> On Sun, Aug 5, 2018 at 10:57 PM ShaoFeng Shi <shaofengshi@apache.org>
>>> wrote:
>>>
>>>> Hi Sonny,
>>>>
>>>> You can configure an R/W separated deployment with two EMRs: one is
>>>> Hadoop only and the other is the HBase cluster. In the EC2 that run Kylin,
>>>> install both Hadoop and HBase client/configuration. And then tell Kylin you
>>>> have Hadoop and HBase in two clusters (refer to the blog). Kylin will run
>>>> jobs in the W cluster and bulk load HFile to the R cluster.
>>>>
>>>> https://kylin.apache.org/blog/2016/06/10/standalone-hbase-cluster/
>>>>
>>>> Many Kylin users run in this R/W separated architecture. I once tried
>>>> it on Azure with two clusters, it worked well. Not tested with EMR, but I
>>>> think they are similar.
>>>>
>>>>
>>>> 2018-08-06 10:55 GMT+08:00 Sonny Heer <sonnyheer@gmail.com>:
>>>>
>>>>> Yea that would be great if Kylin can have a centralized metastore in
>>>>> RDS.
>>>>>
>>>>> The big problem for us now is this:
>>>>>
>>>>> 2 emr clusters each running kylin on master node.  Both share hbase s3
>>>>> root dir.
>>>>>
>>>>> Cluster A creates a cube and does a build.  Cluster B can see the cube
>>>>> as it builds in “monitor”, but once cube is finished.  Cube is “ready”
only
>>>>> in cluster A (job launched from).
>>>>>
>>>>> We need somewhat isolated kylin nodes that can still share the same
>>>>> backend.  This is a big win since then each cluster can scale read/write
>>>>> independently in EMR - this is our goal.  Having read/write in the same
>>>>> cluster doesn’t work for various reasons...
>>>>>
>>>>> It seems kylin is really close since the monitoring of the cube is in
>>>>> sync when sharing same hbase backend.
>>>>>
>>>>> Using read replica did not work - when we try to login from the
>>>>> replica kylin want able to work
>>>>>
>>>>>
>>>>>
>>>>> On Sun, Aug 5, 2018 at 7:01 PM ShaoFeng Shi <shaofengshi@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Hi Sonny,
>>>>>>
>>>>>> EMR HBase read replica is a great feature, but we didn't try. Are
you
>>>>>> going to using this feature? or just want to deploy Kylin as a cluster?
>>>>>>
>>>>>> If putting Kylin metadata to RDS, can it be easier for you?
>>>>>>
>>>>>> 2018-08-04 0:05 GMT+08:00 Sonny Heer <sonnyheer@gmail.com>:
>>>>>>
>>>>>>> we'd like to use emr hbase read replicas if possible.  We had
some
>>>>>>> issues using this stragety since kylin requires write capability
from all
>>>>>>> nodes (on login for example).
>>>>>>>
>>>>>>> idea is to cluster kylin using multiple EMRs on master node.
 If
>>>>>>> this isn't possible we may go with separate instance approach,
but that is
>>>>>>> prone to errors as emr libs have to copied around..
>>>>>>>
>>>>>>> ref:
>>>>>>>
>>>>>>> https://aws.amazon.com/blogs/big-data/setting-up-read-replica-clusters-with-hbase-on-amazon-s3/
>>>>>>>
>>>>>>> Anyone else have experience or can share their use case on emr?
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>> On Thu, Aug 2, 2018 at 2:32 PM Sonny Heer <sonnyheer@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Is it possible in the new version of kylin to have multiple
EMR
>>>>>>>> clusters with Kylin installed on master node but talking
to the same S3
>>>>>>>> location.
>>>>>>>>
>>>>>>>> e.g. one Write EMR cluster and one Read EMR cluster
>>>>>>>>
>>>>>>>> ?
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best regards,
>>>>>>
>>>>>> Shaofeng Shi 史少锋
>>>>>>
>>>>>>
>>>>
>>>>
>>>> --
>>>> Best regards,
>>>>
>>>> Shaofeng Shi 史少锋
>>>>
>>>>

Mime
View raw message