hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simanchal Das <simanchal....@outlook.com>
Subject Re: Review Request 49619: sorting of tuple array using multiple fields
Date Mon, 11 Jul 2016 08:37:26 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49619/
-----------------------------------------------------------

(Updated July 11, 2016, 8:37 a.m.)


Review request for hive, Ashutosh Chauhan and Carl Steinbach.


Repository: hive-git


Description
-------

Problem Statement:

When we are working with complex structure of data like avro.
Most of the times we are encountering array contains multiple tuples and each tuple have struct
schema.
Suppose here struct schema is like below:
{
	"name": "employee",
	"type": [{
		"type": "record",
		"name": "Employee",
		"namespace": "com.company.Employee",
		"fields": [{
			"name": "empId",
			"type": "int"
		}, {
			"name": "empName",
			"type": "string"
		}, {
			"name": "age",
			"type": "int"
		}, {
			"name": "salary",
			"type": "double"
		}]
	}]
}

Then while running our hive query complex array looks like array of employee objects.
Example: 
	//(array<struct<empId,empName,age,salary>>)
	Array[Employee(100,Foo,20,20990),Employee(500,Boo,30,50990),Employee(700,Harry,25,40990),Employee(100,Tom,35,70990)]

When we are implementing business use cases day to day life we are encountering problems like
sorting a tuple array by specific field[s] like empId,name,salary,etc by ASC or DESC order.
Proposal:
I have developed a udf 'sort_array_by' which will sort a tuple array by one or more fields
in ASC or DESC order provided by user ,default is ascending order .
Example:
	1.Select sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Salary","ASC");
	output: array[struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(500,Boo,30,50990),struct(100,Tom,35,70990)]
	
	2.Select sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,80990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","ASC");
	output: array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]

	3.Select sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","Age,"ASC");
	output: array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]


Diffs (updated)
-----

  itests/src/test/resources/testconfiguration.properties 1ab914d 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 2f4a94c 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSortArrayByField.java PRE-CREATION

  ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFSortArrayByField.java PRE-CREATION

  ql/src/test/queries/clientnegative/udf_sort_array_by_wrong1.q PRE-CREATION 
  ql/src/test/queries/clientnegative/udf_sort_array_by_wrong2.q PRE-CREATION 
  ql/src/test/queries/clientnegative/udf_sort_array_by_wrong3.q PRE-CREATION 
  ql/src/test/queries/clientpositive/udf_sort_array_by.q PRE-CREATION 
  ql/src/test/results/beelinepositive/show_functions.q.out 4f3ec40 
  ql/src/test/results/clientnegative/udf_sort_array_by_wrong1.q.out PRE-CREATION 
  ql/src/test/results/clientnegative/udf_sort_array_by_wrong2.q.out PRE-CREATION 
  ql/src/test/results/clientnegative/udf_sort_array_by_wrong3.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/show_functions.q.out a811747 
  ql/src/test/results/clientpositive/udf_sort_array_by.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/49619/diff/


Testing
-------

Junit test cases and query.q files are attached


Thanks,

Simanchal Das


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message