hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simanchal Das <simanchal....@outlook.com>
Subject Re: Review Request 49619: sorting of tuple array using multiple fields
Date Fri, 08 Jul 2016 12:37:44 GMT


> On July 7, 2016, 6:45 a.m., Carl Steinbach wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java, line 427
> > <https://reviews.apache.org/r/49619/diff/2/?file=1438308#file1438308line427>
> >
> >     To me "sort_array_field" makes it sound like this function sorts the elements
in an array field, as opposed to sorting an array on a particular field, which is what is
actually does. I think the purpose of this function would be clearer if the name were changed
'sort_array_on_field' or 'sort_array_by' (I prefer the latter).

fixed


> On July 7, 2016, 6:45 a.m., Carl Steinbach wrote:
> > ql/src/test/queries/clientpositive/udf_sort_array_field.q, line 1
> > <https://reviews.apache.org/r/49619/diff/2/?file=1438313#file1438313line1>
> >
> >     Is this really necessary?

removed


> On July 7, 2016, 6:45 a.m., Carl Steinbach wrote:
> > ql/src/test/queries/clientpositive/udf_sort_array_field.q, line 9
> > <https://reviews.apache.org/r/49619/diff/2/?file=1438313#file1438313line9>
> >
> >     No need for this. Please remove.

removed


> On July 7, 2016, 6:45 a.m., Carl Steinbach wrote:
> > ql/src/test/queries/clientpositive/udf_sort_array_field.q, line 16
> > <https://reviews.apache.org/r/49619/diff/2/?file=1438313#file1438313line16>
> >
> >     The rows should have different struct values.

chnaged the values


> On July 7, 2016, 6:45 a.m., Carl Steinbach wrote:
> > ql/src/test/queries/clientpositive/udf_sort_array_field.q, line 25
> > <https://reviews.apache.org/r/49619/diff/2/?file=1438313#file1438313line25>
> >
> >     Consider using named_struct() instead of struct(). This will allow you to provide
names for the struct fields.

Used named_struct()


- Simanchal


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49619/#review141130
-----------------------------------------------------------


On July 8, 2016, 12:35 p.m., Simanchal Das wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/49619/
> -----------------------------------------------------------
> 
> (Updated July 8, 2016, 12:35 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Carl Steinbach.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Problem Statement:
> 
> When we are working with complex structure of data like avro.
> Most of the times we are encountering array contains multiple tuples and each tuple have
struct schema.
> Suppose here struct schema is like below:
> {
> 	"name": "employee",
> 	"type": [{
> 		"type": "record",
> 		"name": "Employee",
> 		"namespace": "com.company.Employee",
> 		"fields": [{
> 			"name": "empId",
> 			"type": "int"
> 		}, {
> 			"name": "empName",
> 			"type": "string"
> 		}, {
> 			"name": "age",
> 			"type": "int"
> 		}, {
> 			"name": "salary",
> 			"type": "double"
> 		}]
> 	}]
> }
> 
> Then while running our hive query complex array looks like array of employee objects.
> Example: 
> 	//(array<struct<empId,empName,age,salary>>)
> 	Array[Employee(100,Foo,20,20990),Employee(500,Boo,30,50990),Employee(700,Harry,25,40990),Employee(100,Tom,35,70990)]
> 
> When we are implementing business use cases day to day life we are encountering problems
like sorting a tuple array by specific field[s] like empId,name,salary,etc by ASC or DESC
order.
> Proposal:
> I have developed a udf 'sort_array_by' which will sort a tuple array by one or more fields
in ASC or DESC order provided by user ,default is ascending order .
> Example:
> 	1.Select sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Salary","ASC");
> 	output: array[struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(500,Boo,30,50990),struct(100,Tom,35,70990)]
> 	
> 	2.Select sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,80990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","ASC");
> 	output: array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]
> 
> 	3.Select sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","Age,"ASC");
> 	output: array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]
> 
> 
> Diffs
> -----
> 
>   itests/src/test/resources/testconfiguration.properties 1ab914d 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 2f4a94c 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSortArrayByField.java PRE-CREATION

>   ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFSortArrayByField.java
PRE-CREATION 
>   ql/src/test/queries/clientnegative/udf_sort_array_by_wrong1.q PRE-CREATION 
>   ql/src/test/queries/clientnegative/udf_sort_array_by_wrong2.q PRE-CREATION 
>   ql/src/test/queries/clientnegative/udf_sort_array_by_wrong3.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/udf_sort_array_by.q PRE-CREATION 
>   ql/src/test/results/beelinepositive/show_functions.q.out 4f3ec40 
>   ql/src/test/results/clientnegative/udf_sort_array_by_wrong1.q.out PRE-CREATION 
>   ql/src/test/results/clientnegative/udf_sort_array_by_wrong2.q.out PRE-CREATION 
>   ql/src/test/results/clientnegative/udf_sort_array_by_wrong3.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/show_functions.q.out a811747 
>   ql/src/test/results/clientpositive/udf_sort_array_by.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/49619/diff/
> 
> 
> Testing
> -------
> 
> Junit test cases and query.q files are attached
> 
> 
> Thanks,
> 
> Simanchal Das
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message