impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "anujphadke (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4848: Add WIDTH BUCKET() function
Date Thu, 16 Nov 2017 06:12:55 GMT
anujphadke has posted comments on this change. ( http://gerrit.cloudera.org:8080/6023 )

Change subject: IMPALA-4848: Add WIDTH_BUCKET() function
......................................................................


Patch Set 7:

(4 comments)

Yes, I have been discussing these approaches with Taras and Alex. 
I have benchmarked these  approaches -

Created a large table with1073741824 rows . The patch with 
DoubleVal outperforms (patch set 3)

Using just int 256
444.58s
453.40s

Using int128
159.28s
155.25s

Binary search approach // This was done with float array and need to change this to using
decimalVal
109.21s
109.20s

DoubleVal (patch set 3)
104.20s
104.20s

Current status -
Will send out a patch which uses  int128_t (int256_t in case of overflows) for storing the
intermediate results very soon. Will continue working on exploring the binary search approach
later and will send out a follow up patch if we see performance improvements.

http://gerrit.cloudera.org:8080/#/c/6023/7/be/src/exprs/math-functions-ir.cc
File be/src/exprs/math-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/6023/7/be/src/exprs/math-functions-ir.cc@429
PS7, Line 429: bucket_width
> This should be called bucket_number to make it more clear
Done


http://gerrit.cloudera.org:8080/#/c/6023/7/be/src/exprs/math-functions-ir.cc@431
PS7, Line 431: width_size
> width_size is a confusing name. This should be called something like "dista
Done


http://gerrit.cloudera.org:8080/#/c/6023/7/be/src/exprs/math-functions-ir.cc@479
PS7, Line 479:     result.val = num_buckets.val;
> I think it's clearer and simpler to write:
Done


http://gerrit.cloudera.org:8080/#/c/6023/7/be/src/exprs/math-functions-ir.cc@516
PS7, Line 516:   int256_t x = ConvertToInt256(buckets.value()) * ConvertToInt256(width_size.value());
> This idea may give a nice performance boost (if it works) because all the h
This patch stores intermediate results in int256_t only when needed. Uses int128_t otherwise.
Will do some more benchmarking and tests for the binary search approach and will post a follow
up patch.



-- 
To view, visit http://gerrit.cloudera.org:8080/6023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I081bc916b1bef7b929ca161a9aade3b54c6b858f
Gerrit-Change-Number: 6023
Gerrit-PatchSet: 7
Gerrit-Owner: anujphadke <aphadke@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Michael Brown <mikeb@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Gerrit-Reviewer: anujphadke <aphadke@cloudera.com>
Gerrit-Comment-Date: Thu, 16 Nov 2017 06:12:55 +0000
Gerrit-HasComments: Yes

Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message