hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <jornfra...@gmail.com>
Subject Re: Running the same query on 1 billion rows fact table in Hive on Spark compared to Sybase IQ columnar database
Date Wed, 30 Dec 2015 21:16:54 GMT
Hdp Should have TEZ already on-Board bye default. 

> On 30 Dec 2015, at 21:42, Marcin Tustin <mtustin@handybook.com> wrote:
> 
> I'm afraid I use the HDP distribution so I haven't yet had to compile anything. (Incidentally,
this isn't a recommendation of HDP over anything else). 
> 
>> On Wed, Dec 30, 2015 at 3:33 PM, Mich Talebzadeh <mich@peridale.co.uk> wrote:
>> Thanks Marcin
>> 
>>  
>> 
>> Trying to build TEZ 0.7 in
>> 
>>  
>> 
>> /usr/lib/apache-tez-0.7.0-src
>> 
>>  
>> 
>> using
>> 
>>  
>> 
>> mvn -X clean package -DskipTests=true -Dmaven.javadoc.skip=true
>> 
>>  
>> 
>> with mvn version 3.2.5 (as opposed to 3.3) as I read that I can build it OK with
3.2.5 following the same error ass below
>> 
>>  
>> 
>> mvn --version
>> 
>> Apache Maven 3.2.5 (12a6b3acb947671f09b81f49094c53f426d8cea1; 2014-12-14T17:29:23+00:00)
>> 
>> Maven home: /usr/local/apache-maven/apache-maven-3.2.5
>> 
>> Java version: 1.7.0_25, vendor: Oracle Corporation
>> 
>> Java home: /usr/java/jdk1.7.0_25/jre
>> 
>>  
>> 
>> I get this error
>> 
>>  
>> 
>> [INFO] tez-ui ............................................. FAILURE [  0.411 s]
>> 
>> [
>> 
>>  
>> 
>> DEBUG] -- end configuration --
>> 
>> [INFO] Running 'npm install --color=false' in /usr/lib/apache-tez-0.7.0-src/tez-ui/src/main/webapp
>> 
>> [INFO] /usr/lib/apache-tez-0.7.0-src/tez-ui/src/main/webapp/node/with_new_path.sh:
line 3: 23781 Aborted                 "$@"
>> 
>>  
>> 
>>  
>> 
>> [ERROR] Failed to execute goal com.github.eirslett:frontend-maven-plugin:0.0.16:npm
(npm install) on project tez-ui: Failed to run task: 'npm install --color=false' failed. (error
code 134) -> [Help 1]
>> 
>> org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal com.github.eirslett:frontend-maven-plugin:0.0.16:npm
(npm install) on project tez-ui: Failed to run task
>> 
>>  
>> 
>>  
>> 
>> any ideas as there is little info available in net.
>> 
>>  
>> 
>>  
>> 
>> Thanks
>> 
>>  
>> 
>> Mich Talebzadeh
>> 
>>  
>> 
>> Sybase ASE 15 Gold Medal Award 2008
>> 
>> A Winning Strategy: Running the most Critical Financial Data on ASE 15
>> 
>> http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf
>> 
>> Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", ISBN
978-0-9563693-0-7.
>> 
>> co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 978-0-9759693-0-4
>> 
>> Publications due shortly:
>> 
>> Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8
>> 
>> Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one out
shortly
>> 
>>  
>> 
>> http://talebzadehmich.wordpress.com
>> 
>>  
>> 
>> NOTE: The information in this email is proprietary and confidential. This message
is for the designated recipient only, if you are not the intended recipient, you should destroy
it immediately. Any information in this message shall not be understood as given or endorsed
by Peridale Technology Ltd, its subsidiaries or their employees, unless expressly so stated.
It is the responsibility of the recipient to ensure that this email is virus free, therefore
neither Peridale Ltd, its subsidiaries nor their employees accept any responsibility.
>> 
>>  
>> 
>> From: Marcin Tustin [mailto:mtustin@handybook.com] 
>> Sent: 30 December 2015 19:27
>> 
>> 
>> To: user@hive.apache.org
>> Subject: Re: Running the same query on 1 billion rows fact table in Hive on Spark
compared to Sybase IQ columnar database
>>  
>> 
>> I'm using TEZ 0.7.0.2.3 with hive 1.2.1.2.3. I can confirm that TEZ is much faster
than MR in pretty much all cases. Also, with hive, you'll make sure you've performed optimizations
like aligning ORC stripe sizes with HDFS block sizes, and concatenated your tables (not so
much an optimization as a must for avoiding the small files problem).
>> 
>>  
>> 
>> On Wed, Dec 30, 2015 at 2:19 PM, Mich Talebzadeh <mich@peridale.co.uk> wrote:
>> 
>> Thanks again Jorn.
>> 
>>  
>> 
>>  
>> 
>> Both Hive and Sybase IQ are running on the same host. Yes for Sybase IQ I have compression
enabled. The FACT table in IQ (sales) has LF (read bitmap) indexes on the time_id column.
For the dimension table (times) I have time_id defined as primary key. Also Sybase IQ creates
FP (fast projection) indexes on every column by default.
>> 
>>  
>> 
>> Anyway I am trying to download and build TEZ. Do we know which version of TEZ works
with Hive 1.2.1 please? 0.8 seems to be in alpha
>> 
>>  
>> 
>> Thanks
>> 
>>  
>> 
>> Mich Talebzadeh
>> 
>>  
>> 
>> Sybase ASE 15 Gold Medal Award 2008
>> 
>> A Winning Strategy: Running the most Critical Financial Data on ASE 15
>> 
>> http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf
>> 
>> Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", ISBN
978-0-9563693-0-7.
>> 
>> co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 978-0-9759693-0-4
>> 
>> Publications due shortly:
>> 
>> Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8
>> 
>> Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one out
shortly
>> 
>>  
>> 
>> http://talebzadehmich.wordpress.com
>> 
>>  
>> 
>> NOTE: The information in this email is proprietary and confidential. This message
is for the designated recipient only, if you are not the intended recipient, you should destroy
it immediately. Any information in this message shall not be understood as given or endorsed
by Peridale Technology Ltd, its subsidiaries or their employees, unless expressly so stated.
It is the responsibility of the recipient to ensure that this email is virus free, therefore
neither Peridale Ltd, its subsidiaries nor their employees accept any responsibility.
>> 
>>  
>> 
>> From: Jörn Franke [mailto:jornfranke@gmail.com] 
>> Sent: 30 December 2015 16:29
>> 
>> 
>> To: user@hive.apache.org
>> Subject: Re: Running the same query on 1 billion rows fact table in Hive on Spark
compared to Sybase IQ columnar database
>> 
>>  
>> 
>> 
>> Hmm i think the execution Engine TEZ has (currently) the most optimizations on Hive.
What about your hardware - is it the same? Do you have also compression on Sybase?
>> 
>> Alternatively you need to wait for Hive for interactive analytics (tez 0.8 + llap).

>> 
>> 
>> On 30 Dec 2015, at 13:47, Mich Talebzadeh <mich@peridale.co.uk> wrote:
>> 
>> Hi Jorn,
>> 
>>  
>> 
>> Thanks for your reply. My Hive version is 1.2.1 on Spark 1.3.1. I have not tried
it on TEZ. I tried the query on MR engine and it did nor fair better. I also ran it without
SDDDEV function and found out that the function did not slow it down.
>> 
>>  
>> 
>> I tried a simple query as follows builr in sales FACT table 1e9 rows and dimension
table times (1826 rows)
>> 
>>  
>> 
>> --
>> 
>> -- Get the total amount sold for each calendar month
>> 
>> --
>> 
>> SELECT t.calendar_month_desc, SUM(s.amount_sold)
>> 
>> FROM sales s, times t WHERE s.time_id = t.time_id
>> 
>> GROUP BY t.calendar_month_desc;
>> 
>>  
>> 
>> Now Sybase IQ comes back in around 30 seconds.
>> 
>>  
>> 
>> Started query at Dec 30 2015 08:14:33:399AM
>> 
>> (48 rows affected)
>> 
>> Finished query at Dec 30 2015 08:15:04:640AM
>> 
>>  
>> 
>> Whereas Hive with the following setting and running the same query
>> 
>>  
>> 
>> set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
>> 
>> set hive.optimize.bucketmapjoin=true;
>> 
>> set hive.optimize.bucketmapjoin.sortedmerge=true;
>> 
>>  
>> 
>> Comes back in
>> 
>>  
>> 
>> 48 rows selected (1514.687 seconds)
>> 
>>  
>> 
>> I don’t know what else can be done. Obviously this is all schema on read so I am
not sure I can change bucketing on FACT table based on one query alone!
>> 
>>  
>> 
>>  
>> 
>>  
>> 
>> +--------------------------------------------------------------------+--+
>> 
>> |                           createtab_stmt                           |
>> 
>> +--------------------------------------------------------------------+--+
>> 
>> | CREATE TABLE `times`(                                              |
>> 
>> |   `time_id` timestamp,                                             |
>> 
>> |   `day_name` varchar(9),                                           |
>> 
>> |   `day_number_in_week` int,                                        |
>> 
>> |   `day_number_in_month` int,                                       |
>> 
>> |   `calendar_week_number` int,                                      |
>> 
>> |   `fiscal_week_number` int,                                        |
>> 
>> |   `week_ending_day` timestamp,                                     |
>> 
>> |   `week_ending_day_id` bigint,                                     |
>> 
>> |   `calendar_month_number` int,                                     |
>> 
>> |   `fiscal_month_number` int,                                       |
>> 
>> |   `calendar_month_desc` varchar(8),                                |
>> 
>> ----------
>> 
>> |   `days_in_fis_year` bigint,                                       |
>> 
>> |   `end_of_cal_year` timestamp,                                     |
>> 
>> |   `end_of_fis_year` timestamp)                                     |
>> 
>> | CLUSTERED BY (                                                     |
>> 
>> |   time_id)                                                         |
>> 
>> | INTO 256 BUCKETS                                                   |
>> 
>> | ROW FORMAT SERDE                                                   |
>> 
>> |   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'                      |
>> 
>> | STORED AS INPUTFORMAT                                              |
>> 
>> |   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'                |
>> 
>> | OUTPUTFORMAT                                                       |
>> 
>> |   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'               |
>> 
>> | LOCATION                                                           |
>> 
>> |   'hdfs://rhes564:9000/user/hive/warehouse/oraclehadoop.db/times'  |
>> 
>> | TBLPROPERTIES (                                                    |
>> 
>> |   'COLUMN_STATS_ACCURATE'='true', |
>> 
>> |   'numFiles'='1',                                                  |
>> 
>> |   'numRows'='1826',                                                |
>> 
>> |   'orc.bloom.filter.columns'='TIME_ID',                            |
>> 
>> |   'orc.bloom.filter.fpp'='0.05',                                   |
>> 
>> |   'orc.compress'='SNAPPY',                                         |
>> 
>> |   'orc.create.index'='true',                                       |
>> 
>> |   'orc.row.index.stride'='10000', |
>> 
>> |   'orc.stripe.size'='268435456',                                   |
>> 
>> |   'rawDataSize'='0',                                               |
>> 
>> |   'totalSize'='11155',                                             |
>> 
>> |   'transient_lastDdlTime'='1451429900')                            |
>> 
>>  
>> 
>> ;
>> 
>>  
>> 
>>  
>> 
>> http://talebzadehmich.wordpress.com
>> 
>>  
>> 
>> NOTE: The information in this email is proprietary and confidential. This message
is for the designated recipient only, if you are not the intended recipient, you should destroy
it immediately. Any information in this message shall not be understood as given or endorsed
by Peridale Technology Ltd, its subsidiaries or their employees, unless expressly so stated.
It is the responsibility of the recipient to ensure that this email is virus free, therefore
neither Peridale Ltd, its subsidiaries nor their employees accept any responsibility.
>> 
>>  
>> 
>> From: Jörn Franke [mailto:jornfranke@gmail.com] 
>> Sent: 30 December 2015 08:28
>> To: user@hive.apache.org
>> Subject: Re: Running the same query on 1 billion rows fact table in Hive on Spark
compared to Sybase IQ columnar database
>> 
>>  
>> 
>> Have you tried it with Hive ob TEZ? It contains (currently) more optimizations than
Hive on Spark.
>> 
>> I assume you use the latest Hive version.
>> 
>> Additionally you may want to think about calculating statistics (depending on your
configuration you need to trigger it) - I am not sure if Spark can use them.
>> 
>> I am not sure if bloom filters on the columns you mention make sense. You may also
want to increase stride size (depending on your data).
>> 
>> Currently you bucket by a lot of fields, which may not make sense. You also may want
to sort the data by customer Id in the table.
>> 
>> You also seem to have a lot of reducers, which you may want to decrease.
>> 
>>  
>> 
>> Have you tried without "having stddev_samp" ? Is the query exactly the same as in
Sybase?
>> 
>> 
>> On 29 Dec 2015, at 11:53, Mich Talebzadeh <mich@peridale.co.uk> wrote:
>> 
>> Hi,
>> 
>>  
>> 
>> I have a fact table in Hive imported from Sybase IQ via SQOOP with 1 billion rows
as follows:
>> 
>>  
>> 
>> show create table sales;
>> 
>> +-------------------------------------------------------------------------------+--+
>> 
>> |                                createtab_stmt                                 |
>> 
>> +-------------------------------------------------------------------------------+--+
>> 
>> | CREATE TABLE `sales`(                                                         |
>> 
>> |   `prod_id` bigint,                                                           |
>> 
>> |   `cust_id` bigint,                                                           |
>> 
>> |   `time_id` timestamp,                                                        |
>> 
>> |   `channel_id` bigint,                                                        |
>> 
>> |   `promo_id` bigint,                                                          |
>> 
>> |   `quantity_sold` decimal(10,0),                                              |
>> 
>> |   `amount_sold` decimal(10,0))                                                |
>> 
>> | CLUSTERED BY (                                                                |
>> 
>> |   prod_id,                                                                    |
>> 
>> |   cust_id,                                                                    |
>> 
>> |   time_id,                                                                    |
>> 
>> |   channel_id,                                                                 |
>> 
>> |   promo_id)                                                                   |
>> 
>> | INTO 256 BUCKETS                                                              |
>> 
>> | ROW FORMAT SERDE                                                              |
>> 
>> |   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'                                 |
>> 
>> | STORED AS INPUTFORMAT                                                         |
>> 
>> |   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'                           |
>> 
>> | OUTPUTFORMAT                                                                  |
>> 
>> |   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'                          |
>> 
>> | LOCATION                                                                      |
>> 
>> |   'hdfs://rhes564:9000/user/hive/warehouse/oraclehadoop.db/sales'             |
>> 
>> | TBLPROPERTIES (                                                               |
>> 
>> |   'COLUMN_STATS_ACCURATE'='true',                                             |
>> 
>> |   'last_modified_by'='hduser',                                                |
>> 
>> |   'last_modified_time'='1451305626',                                          |
>> 
>> |   'numFiles'='11',                                                            |
>> 
>> |   'numRows'='1000000000',                                                     |
>> 
>> |   'orc.bloom.filter.columns'='PROD_ID,CUST_ID,TIME_ID,CHANNEL_ID,PROMO_ID',   |
>> 
>> |   'orc.bloom.filter.fpp'='0.05',                                              |
>> 
>> |   'orc.compress'='SNAPPY',                                                    |
>> 
>> |   'orc.create.index'='true',                                                  |
>> 
>> |   'orc.row.index.stride'='10000',                                             |
>> 
>> |   'orc.stripe.size'='268435456',                                              |
>> 
>> |   'rawDataSize'='296000000000',                                               |
>> 
>> |   'totalSize'='2678882153',                                                   |
>> 
>> |   'transient_lastDdlTime'='1451305626')                                       |
>> 
>> +-------------------------------------------------------------------------------+--+
>> 
>>  
>> 
>> I use the following query to run against sales table only against Hive
>> 
>>  
>> 
>> SELECT
>> 
>>           rs.Customer_ID
>> 
>>         , rs.Number_of_orders
>> 
>>         , rs.Total_customer_amount
>> 
>>         , rs.Average_order
>> 
>>         , rs.Standard_deviation
>> 
>> FROM
>> 
>> (
>> 
>>         SELECT cust_id AS Customer_ID,
>> 
>>         COUNT(amount_sold) AS Number_of_orders,
>> 
>>         SUM(amount_sold) AS Total_customer_amount,
>> 
>>         AVG(amount_sold) AS Average_order,
>> 
>>         stddev_samp(amount_sold) AS Standard_deviation
>> 
>>         FROM sales
>> 
>>         GROUP BY cust_id
>> 
>>         HAVING SUM(amount_sold) > 94000
>> 
>>         AND AVG(amount_sold) < stddev_samp(amount_sold)
>> 
>> ) rs
>> 
>> ORDER BY
>> 
>>           -- Total_customer_amount DESC
>> 
>>           3 DESC
>> 
>>  
>> 
>> Hive comes back in 17 minutes with 5,948 rows
>> 
>>  
>> 
>> bl -f sales.hql > sales.log
>> 
>> Connecting to jdbc:hive2://rhes564:10010/default
>> 
>> Connected to: Apache Hive (version 1.2.1)
>> 
>> Driver: Hive JDBC (version 1.2.1)
>> 
>> Transaction isolation: TRANSACTION_REPEATABLE_READ
>> 
>> Running init script /home/hduser/dba/bin/hive_on_spark_init.hql
>> 
>> No rows affected (0.097 seconds)
>> 
>> No rows affected (0.001 seconds)
>> 
>> No rows affected (0.001 seconds)
>> 
>> No rows affected (0.038 seconds)
>> 
>> INFO  : Warning: Using constant number 3 in order by. If you try to use position
alias when hive.groupby.orderby.position.alias is false, the position alias will be ignored.
>> 
>> INFO  :
>> 
>> Query Hive on Spark job[0] stages:
>> 
>> INFO  : 0
>> 
>> INFO  : 1
>> 
>> INFO  : 2
>> 
>> INFO  :
>> 
>> Status: Running (Hive on Spark job[0])
>> 
>> INFO  : Job Progress Format
>> 
>> CurrentTime StageId_StageAttemptId: SucceededTasksCount(+RunningTasksCount-FailedTasksCount)/TotalTasksCount
[StageCost]
>> 
>> INFO  : 2015-12-29 09:33:25,815 Stage-0_0: 0/11 Stage-1_0: 0/1009       Stage-2_0:
0/1
>> 
>> INFO  : 2015-12-29 09:33:28,829 Stage-0_0: 0/11 Stage-1_0: 0/1009       Stage-2_0:
0/1
>> 
>> INFO  : 2015-12-29 09:33:31,857 Stage-0_0: 0(+2)/11     Stage-1_0: 0/1009       Stage-2_0:
0/1
>> 
>> INFO  : 2015-12-29 09:33:34,875 Stage-0_0: 0(+2)/11     Stage-1_0: 0/1009       Stage-2_0:
0/1
>> 
>> INFO  : 2015-12-29 09:33:37,903 Stage-0_0: 0(+2)/11     Stage-1_0: 0/1009       Stage-2_0:
0/1
>> 
>> INFO  : 2015-12-29 09:33:40,918 Stage-0_0: 0(+2)/11     Stage-1_0: 0/1009       Stage-2_0:
0/1
>> 
>> INFO  : 2015-12-29 09:33:43,939 Stage-0_0: 0(+2)/11     Stage-1_0: 0/1009       Stage-2_0:
0/1
>> 
>> INFO  : 2015-12-29 09:33:46,958 Stage-0_0: 0(+2)/11     Stage-1_0: 0/1009       Stage-2_0:
0/1
>> 
>> INFO  : 2015-12-29 09:33:49,971 Stage-0_0: 0(+2)/11     Stage-1_0: 0/1009       Stage-2_0:
0/1
>> 
>> INFO  : 2015-12-29 09:33:52,991 Stage-0_0: 0(+2)/11     Stage-1_0: 0/1009       Stage-2_0:
0/1
>> 
>> INFO  : 2015-12-29 09:33:56,007 Stage-0_0: 0(+2)/11     Stage-1_0: 0/1009       Stage-2_0:
0/1
>> 
>>  
>> 
>> INFO  : 2015-12-29 09:50:03,578 Stage-0_0: 10(+1)/11    Stage-1_0: 0/1009       Stage-2_0:
0/1
>> 
>> INFO  : 2015-12-29 09:50:06,590 Stage-0_0: 10(+1)/11    Stage-1_0: 0/1009       Stage-2_0:
0/1
>> 
>> INFO  : 2015-12-29 09:50:09,602 Stage-0_0: 10(+1)/11    Stage-1_0: 0/1009       Stage-2_0:
0/1
>> 
>> INFO  : 2015-12-29 09:50:10,606 Stage-0_0: 11/11 Finished       Stage-1_0: 0(+2)/1009
  Stage-2_0: 0/1
>> 
>> INFO  : 2015-12-29 09:50:11,610 Stage-0_0: 11/11 Finished       Stage-1_0: 6(+2)/1009
  Stage-2_0: 0/1
>> 
>> INFO  : 2015-12-29 09:50:12,618 Stage-0_0: 11/11 Finished       Stage-1_0: 30(+2)/1009
 Stage-2_0: 0/1
>> 
>> INFO  : 2015-12-29 09:50:13,622 Stage-0_0: 11/11 Finished       Stage-1_0: 59(+2)/1009
 Stage-2_0: 0/1
>> 
>> INFO  : 2015-12-29 09:50:14,626 Stage-0_0: 11/11 Finished       Stage-1_0: 90(+2)/1009
 Stage-2_0: 0/1
>> 
>> INFO  : 2015-12-29 09:50:15,631 Stage-0_0: 11/11 Finished       Stage-1_0: 124(+2)/1009
Stage-2_0: 0/1
>> 
>> INFO  : 2015-12-29 09:50:16,654 Stage-0_0: 11/11 Finished       Stage-1_0: 160(+2)/1009
Stage-2_0: 0/1
>> 
>> INFO  : 2015-12-29 09:50:17,659 Stage-0_0: 11/11 Finished       Stage-1_0: 193(+2)/1009
Stage-2_0: 0/1
>> 
>> INFO  : 2015-12-29 09:50:18,663 Stage-0_0: 11/11 Finished       Stage-1_0: 228(+2)/1009
Stage-2_0: 0/1
>> 
>> INFO  : 2015-12-29 09:50:19,667 Stage-0_0: 11/11 Finished       Stage-1_0: 262(+2)/1009
Stage-2_0: 0/1
>> 
>> INFO  : 2015-12-29 09:50:20,672 Stage-0_0: 11/11 Finished       Stage-1_0: 298(+2)/1009
Stage-2_0: 0/1
>> 
>> INFO  : 2015-12-29 09:50:21,679 Stage-0_0: 11/11 Finished       Stage-1_0: 338(+2)/1009
Stage-2_0: 0/1
>> 
>> INFO  : 2015-12-29 09:50:22,687 Stage-0_0: 11/11 Finished       Stage-1_0: 376(+2)/1009
Stage-2_0: 0/1
>> 
>> INFO  : 2015-12-29 09:50:23,691 Stage-0_0: 11/11 Finished       Stage-1_0: 417(+3)/1009
Stage-2_0: 0/1
>> 
>> INFO  : 2015-12-29 09:50:24,696 Stage-0_0: 11/11 Finished       Stage-1_0: 460(+2)/1009
Stage-2_0: 0/1
>> 
>> INFO  : 2015-12-29 09:50:25,699 Stage-0_0: 11/11 Finished       Stage-1_0: 502(+2)/1009
Stage-2_0: 0/1
>> 
>> INFO  : 2015-12-29 09:50:26,707 Stage-0_0: 11/11 Finished       Stage-1_0: 542(+2)/1009
Stage-2_0: 0/1
>> 
>> INFO  : 2015-12-29 09:50:27,712 Stage-0_0: 11/11 Finished       Stage-1_0: 584(+2)/1009
Stage-2_0: 0/1
>> 
>> INFO  : 2015-12-29 09:50:28,719 Stage-0_0: 11/11 Finished       Stage-1_0: 624(+2)/1009
Stage-2_0: 0/1
>> 
>> INFO  : 2015-12-29 09:50:29,730 Stage-0_0: 11/11 Finished       Stage-1_0: 667(+2)/1009
Stage-2_0: 0/1
>> 
>> INFO  : 2015-12-29 09:50:30,736 Stage-0_0: 11/11 Finished       Stage-1_0: 709(+3)/1009
Stage-2_0: 0/1
>> 
>> INFO  : 2015-12-29 09:50:31,740 Stage-0_0: 11/11 Finished       Stage-1_0: 754(+3)/1009
Stage-2_0: 0/1
>> 
>> INFO  : 2015-12-29 09:50:32,743 Stage-0_0: 11/11 Finished       Stage-1_0: 797(+2)/1009
Stage-2_0: 0/1
>> 
>> INFO  : 2015-12-29 09:50:33,747 Stage-0_0: 11/11 Finished       Stage-1_0: 844(+2)/1009
Stage-2_0: 0/1
>> 
>> INFO  : 2015-12-29 09:50:34,754 Stage-0_0: 11/11 Finished       Stage-1_0: 888(+2)/1009
Stage-2_0: 0/1
>> 
>> INFO  : 2015-12-29 09:50:35,759 Stage-0_0: 11/11 Finished       Stage-1_0: 934(+2)/1009
Stage-2_0: 0/1
>> 
>> INFO  : 2015-12-29 09:50:36,764 Stage-0_0: 11/11 Finished       Stage-1_0: 981(+2)/1009
Stage-2_0: 0/1
>> 
>> INFO  : 2015-12-29 09:50:37,768 Stage-0_0: 11/11 Finished       Stage-1_0: 1009/1009
Finished   Stage-2_0: 0(+1)/1
>> 
>> INFO  : 2015-12-29 09:50:38,771 Stage-0_0: 11/11 Finished       Stage-1_0: 1009/1009
Finished   Stage-2_0: 1/1 Finished
>> 
>> INFO  : Status: Finished successfully in 1036.00 seconds
>> 
>> 5,948 rows selected (1074.817 seconds)
>> 
>>  
>> 
>> So it returns 5948 rows in 17 minutes. In contrast IQ returns 5947 rows in 23 seconds
>> 
>>  
>> 
>> Sybase IQ is a columnar database so each column is created as a fast projection index
by default. In addition I have created LF (bitmap) indexes on dimension columns (PROD_ID,
CUST_ID, TIME_ID, CHANNEL_ID, PROMO_ID). Now the query only touches CUST_ID.
>> 
>>  
>> 
>> My suspicion is that it is the Standard Deviation function stddev_samp() that could
be the bottleneck?
>> 
>>  
>> 
>> Thanks
>> 
>>  
>> 
>> Mich Talebzadeh
>> 
>>  
>> 
>> Sybase ASE 15 Gold Medal Award 2008
>> 
>> A Winning Strategy: Running the most Critical Financial Data on ASE 15
>> 
>> http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf
>> 
>> Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", ISBN
978-0-9563693-0-7.
>> 
>> co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 978-0-9759693-0-4
>> 
>> Publications due shortly:
>> 
>> Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8
>> 
>> Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one out
shortly
>> 
>>  
>> 
>> http://talebzadehmich.wordpress.com
>> 
>>  
>> 
>> NOTE: The information in this email is proprietary and confidential. This message
is for the designated recipient only, if you are not the intended recipient, you should destroy
it immediately. Any information in this message shall not be understood as given or endorsed
by Peridale Technology Ltd, its subsidiaries or their employees, unless expressly so stated.
It is the responsibility of the recipient to ensure that this email is virus free, therefore
neither Peridale Ltd, its subsidiaries nor their employees accept any responsibility.
>> 
>>  
>> 
>>  
>> 
>>  
>> 
>> Want to work at Handy? Check out our culture deck and open roles
>> 
>> Latest news at Handy
>> 
>> Handy just raised $50m led by Fidelity
>> 
>>  
>> 
>> <image001.jpg>
>> 
> 
> 
> Want to work at Handy? Check out our culture deck and open roles
> Latest news at Handy
> Handy just raised $50m led by Fidelity
> 

Mime
View raw message