sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fero Szabo via Review Board <nore...@reviews.apache.org>
Subject Review Request 65607: SQOOP-2976 Flag to expand decimal values to fit AVRO schema
Date Mon, 12 Feb 2018 13:08:11 GMT

This is an automatically generated e-mail. To reply, visit:

Review request for Sqoop, Boglarka Egyed and Szabolcs Vasas.

Bugs: SQOOP-2976

Repository: sqoop-trunk


Certain databases, such as SQL Server and Postgres are storing decimal values padded with
0s, should the user insert them with less digits than the given scale. 

Other databases however, such as Oracle and HSQLDB store these numbers without trailing 0s.
Then, when the JDBC driver returns these as BigDecimals, they won't match the scale in the
avro schema.

Take the following SQL commands for an example: 
create table salary (id int, amount number (10,5));
insert into salary (id, amount) values (1, 10.5);
insert into salary (id, amount) values (2, 10.50);
select * from salary;
Records in an Oracle database:
1	10.5
2	10.5

Records in SQL Server (using decimal instead of number in the create statement):
1	10.50000
2	10.50000

The fix is simply checking the scale of the returned BigDecimals against what's in the avro
schema and recreates the objects in case of a mismatch. I've introduced a new property to
enable this new feature, so existing behavior is not affected. 

**Concerns: **
- trimmings can happen silently, should we rather raise an exception? Enabling trimming adds
a new feature, but it also adds the possibility silently lose scale while import. The latter
could be mitigated by a thorough documentation.
- The flags current name () doesn't really match the behavior, should I change it to something
else? (avro.decimal_scale_harmonization.enable)
- How / where to document this new flag?

**Other notable changes:**
- Introduced ArgumentArrayBuilder that reuses the existing Argument class and introduces a
useful builder pattern for creating commandline arguments for tests.
- Slightly modified BaseSqoopTest to fit my needs. *(However, further refactoring would be
required in this class to enable better reuse. For example: the current implementation can't
be used with SQL Server, because one also needs to specify the schema besides the tablename
in the create and insert statements. There are also code duplications.)*


  src/java/org/apache/sqoop/avro/AvroUtil.java 1aae8df2 
  src/java/org/apache/sqoop/config/ConfigurationConstants.java 7a19a62c 
  src/java/org/apache/sqoop/mapreduce/AvroImportMapper.java a5e5bf5a 
  src/test/org/apache/sqoop/manager/hsqldb/TestHsqldbAvroPadding.java PRE-CREATION 
  src/test/org/apache/sqoop/manager/oracle/OracleAvroPaddingImportTest.java PRE-CREATION 
  src/test/org/apache/sqoop/manager/sqlserver/MSSQLTestUtils.java 2220b7d5 
  src/test/org/apache/sqoop/manager/sqlserver/SQLServerAvroPaddingImportTest.java PRE-CREATION

  src/test/org/apache/sqoop/testutil/ArgumentArrayBuilder.java PRE-CREATION 
  src/test/org/apache/sqoop/testutil/AvroTestUtils.java PRE-CREATION 
  src/test/org/apache/sqoop/testutil/BaseSqoopTestCase.java 588f439c 

Diff: https://reviews.apache.org/r/65607/diff/1/


See the 3 new test classes (HSQLDB, Oracle, SQL Server).


Fero Szabo

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message