hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Carl Steinbach <...@apache.org>
Subject Re: Review Request 49619: sorting of tuple array using multiple fields
Date Sat, 09 Jul 2016 20:53:35 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49619/#review141413
-----------------------------------------------------------



Some more comments.


ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSortArrayByField.java (line 56)
<https://reviews.apache.org/r/49619/#comment206912>

    A couple notes:
    
    1. I think the example should actually return a value that is different than the input.
It would also be good to include more than two elements in the input. If screen space is an
issue I recommend only including a single element in each of the structs in the example, which
I think has the added benefit of making the example clearer by not distracting the reader
with irrelevant details.
    
    2. It looks like the default sorting order (ASC) is actually the reverse of what I would
expect it to be, i.e. I expect 'b' to come before 'g'.
    
    3. Related to point (2), I think it's important to ensure that the sorting order of this
UDF is consisent with ORDER BY, e.g. for a table t containing a single row with a single array
of struct field a_struct_array, the queries "SELECT a_struct FROM t LATERAL VIEW explode(a_struct_array)
structTable AS a_struct ORDER BY a_struct.col1 DESC" should return the same results as "SELECT
a_struct FROM t LATERAL VIEW explode(sort_array_by(a_struct_array, 'col1', 'DESC')) structTable
AS a_struct". Note that I probably didn't get the syntax for LATERAL VIEW and explode() correct.



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSortArrayByField.java (line 93)
<https://reviews.apache.org/r/49619/#comment206913>

    Unnecessary string concatenation operators.



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSortArrayByField.java (line 104)
<https://reviews.apache.org/r/49619/#comment206914>

    Unnecessary "+" operator.



ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFSortArrayByField.java (line
1)
<https://reviews.apache.org/r/49619/#comment206915>

    Does this unit test provide any additional coverage or advantages over the q file tests?
Is it necessary to have both?
    
    Note that I am a strong advocate of end-to-end qfile tests over unit tests, which is an
opinion that not everyone holds.


- Carl Steinbach


On July 8, 2016, 12:35 p.m., Simanchal Das wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/49619/
> -----------------------------------------------------------
> 
> (Updated July 8, 2016, 12:35 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Carl Steinbach.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Problem Statement:
> 
> When we are working with complex structure of data like avro.
> Most of the times we are encountering array contains multiple tuples and each tuple have
struct schema.
> Suppose here struct schema is like below:
> {
> 	"name": "employee",
> 	"type": [{
> 		"type": "record",
> 		"name": "Employee",
> 		"namespace": "com.company.Employee",
> 		"fields": [{
> 			"name": "empId",
> 			"type": "int"
> 		}, {
> 			"name": "empName",
> 			"type": "string"
> 		}, {
> 			"name": "age",
> 			"type": "int"
> 		}, {
> 			"name": "salary",
> 			"type": "double"
> 		}]
> 	}]
> }
> 
> Then while running our hive query complex array looks like array of employee objects.
> Example: 
> 	//(array<struct<empId,empName,age,salary>>)
> 	Array[Employee(100,Foo,20,20990),Employee(500,Boo,30,50990),Employee(700,Harry,25,40990),Employee(100,Tom,35,70990)]
> 
> When we are implementing business use cases day to day life we are encountering problems
like sorting a tuple array by specific field[s] like empId,name,salary,etc by ASC or DESC
order.
> Proposal:
> I have developed a udf 'sort_array_by' which will sort a tuple array by one or more fields
in ASC or DESC order provided by user ,default is ascending order .
> Example:
> 	1.Select sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Salary","ASC");
> 	output: array[struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(500,Boo,30,50990),struct(100,Tom,35,70990)]
> 	
> 	2.Select sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,80990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","ASC");
> 	output: array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]
> 
> 	3.Select sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","Age,"ASC");
> 	output: array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]
> 
> 
> Diffs
> -----
> 
>   itests/src/test/resources/testconfiguration.properties 1ab914d 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 2f4a94c 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSortArrayByField.java PRE-CREATION

>   ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFSortArrayByField.java
PRE-CREATION 
>   ql/src/test/queries/clientnegative/udf_sort_array_by_wrong1.q PRE-CREATION 
>   ql/src/test/queries/clientnegative/udf_sort_array_by_wrong2.q PRE-CREATION 
>   ql/src/test/queries/clientnegative/udf_sort_array_by_wrong3.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/udf_sort_array_by.q PRE-CREATION 
>   ql/src/test/results/beelinepositive/show_functions.q.out 4f3ec40 
>   ql/src/test/results/clientnegative/udf_sort_array_by_wrong1.q.out PRE-CREATION 
>   ql/src/test/results/clientnegative/udf_sort_array_by_wrong2.q.out PRE-CREATION 
>   ql/src/test/results/clientnegative/udf_sort_array_by_wrong3.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/show_functions.q.out a811747 
>   ql/src/test/results/clientpositive/udf_sort_array_by.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/49619/diff/
> 
> 
> Testing
> -------
> 
> Junit test cases and query.q files are attached
> 
> 
> Thanks,
> 
> Simanchal Das
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message