hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nitin Pawar <nitinpawar...@gmail.com>
Subject Re: performance and cluster size required
Date Fri, 06 Jun 2014 06:30:10 GMT
on the first part of your question, what should be the cluster size, it is
totally dependent on
1)what type of queries you are performing
2) what type of cluster you have got as in its shared or dedicated to you
only.
3) compressed file format drives the query performance based if the
compression type is splittable or not
4) what is the capacity of each node (compute, memory and storage)


On the second part, as per my understanding there is no way you can write
data to multiple targets using single query.
so you have two options
1) Run query once, save the output to a file and write to two targets
2) Run query twice with different targets


On Fri, Jun 6, 2014 at 6:31 AM, Bogala, Chandra Reddy <Chandra.Bogala@gs.com
> wrote:

> Hi,
>
>   I get 300MB compressed file (structured CSV data) in spool directory
> every 3 minutes from collector. I have around 6 collectors. I move data
> from spool dir to HDFS directory and add as a hive partition for every 15
> minutes data. Then I run different aggregation queries and post data to
> Hbase & Mongo. So the data is around 9 GB compressed for every query. For
> this much data I need to evaluate how many cluster nodes required to finish
> all the aggregation queries with in time ( within 15 minutes partition
> window).What is the best way to evaluate this?
>
>
>
>                 Is there any way I can post aggregated data to both Mongo
> and Hbase ( same query result posting to multiple tables instead of running
> same query multiple times and insert only in single table at a time)?
>
>
>
> Thanks,
>
> Chandra
>



-- 
Nitin Pawar

Mime
View raw message