drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (DRILL-5833) Parquet reader fails with assertion error for Decimal9, Decimal18 types
Date Tue, 03 Oct 2017 07:27:00 GMT

     [ https://issues.apache.org/jira/browse/DRILL-5833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Paul Rogers updated DRILL-5833:
-------------------------------
    Description: 
The {{TestParquetWriter.testDecimal()}} test recently failed. As it turns out, this test never
ran properly before against the "old" Parquet reader. Because the {{store.parquet.use_new_reader}}
was left at a previous value, sometimes the test would run against the "new" reader (and succeed)
or against the "old" reader (and fail.)

Once the test was forced to run against the "old" reader, it fails deep in the Parquet record
reader in {{DrillParquetGroupConverter.getConverterForType()}}.

The code attempts to create a Decimal9 writer by calling {{SingleMapWriter.decimal9(String
name)}} to create the writer. However, the code in this method says:

{code}
  public Decimal9Writer decimal9(String name) {
    // returns existing writer
    final FieldWriter writer = fields.get(name.toLowerCase());
    assert writer != null;
    return writer;
  }
{code}

And, indeed, the assertion is triggered.

As it turns out, the code for Decimal28 shows the proper solution:

{code}
mapWriter.decimal28Sparse(name, metadata.getScale(), metadata.getPrecision())
{code}

That is, pass the scale and precision to this form of the method which actually creates the
writer:

{code}
  public Decimal9Writer decimal9(String name, int scale, int precision) {
{code}

Applying the same pattern to for the Parquet Decimal9 and Decimal18 types allows the above
test to get past the asserts. Given this error, it is clear that this test could never have
run, and so the error in the Parquet reader was never detected.

It also turns out that the test itself is wrong, reversing the validation and test queries:

{code}
  public void runTestAndValidate(String selection, String validationSelection, String inputTable,
String outputFile) throws Exception {
    try {
      deleteTableIfExists(outputFile);
      ...
      // Query reads from the input (JSON) table
      String query = String.format("SELECT %s FROM %s", selection, inputTable);
      String create = "CREATE TABLE " + outputFile + " AS " + query;
      // validate query reads from the output (Parquet) table
      String validateQuery = String.format("SELECT %s FROM " + outputFile, validationSelection);
      test(create);

      testBuilder()
          .unOrdered()
          .sqlQuery(query) // Query under test is input query
          .sqlBaselineQuery(validateQuery) // Baseline query is output query
          .go();
{code}

Given this, it is the Parquet data that is wrong, not the baseline.

  was:
The {{TestParquetWriter.testDecimal()}} test recently failed. As it turns out, this test never
ran properly before. Due to some bug, the CTAS that was supposed to write the file instead
wrote an empty file, and the verification results were also empty. For some reason, the results
are empty when the test is run stand-alone, but contains data when run as part of the test
suite.

Once the test runs properly, it fails deep in the Parquet record reader in {{DrillParquetGroupConverter.getConverterForType()}}.

The code attempts to create a Decimal9 writer by calling {{SingleMapWriter.decimal9(String
name)}} to create the writer. However, the code in this method says:

{code}
  public Decimal9Writer decimal9(String name) {
    // returns existing writer
    final FieldWriter writer = fields.get(name.toLowerCase());
    assert writer != null;
    return writer;
  }
{code}

And, indeed, the assertion is triggered.

As it turns out, the code for Decimal28 shows the proper solution:

{code}
mapWriter.decimal28Sparse(name, metadata.getScale(), metadata.getPrecision())
{code}

That is, pass the scale and precision to this form of the method which actually creates the
writer:

{code}
  public Decimal9Writer decimal9(String name, int scale, int precision) {
{code}

Applying the same pattern to for the Parquet Decimal9 and Decimal18 types allows the above
test to get past the asserts. Given this error, it is clear that this test could never have
run, and so the error in the Parquet reader was never detected.

It also turns out that the test itself is wrong, reversing the validation and test queries:

{code}
  public void runTestAndValidate(String selection, String validationSelection, String inputTable,
String outputFile) throws Exception {
    try {
      deleteTableIfExists(outputFile);
      ...
      // Query reads from the input (JSON) table
      String query = String.format("SELECT %s FROM %s", selection, inputTable);
      String create = "CREATE TABLE " + outputFile + " AS " + query;
      // validate query reads from the output (Parquet) table
      String validateQuery = String.format("SELECT %s FROM " + outputFile, validationSelection);
      test(create);

      testBuilder()
          .unOrdered()
          .sqlQuery(query) // Query under test is input query
          .sqlBaselineQuery(validateQuery) // Baseline query is output query
          .go();
{code}

Given this, it is the Parquet data that is wrong, not the baseline.


> Parquet reader fails with assertion error for Decimal9, Decimal18 types
> -----------------------------------------------------------------------
>
>                 Key: DRILL-5833
>                 URL: https://issues.apache.org/jira/browse/DRILL-5833
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.10.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>             Fix For: 1.12.0
>
>
> The {{TestParquetWriter.testDecimal()}} test recently failed. As it turns out, this test
never ran properly before against the "old" Parquet reader. Because the {{store.parquet.use_new_reader}}
was left at a previous value, sometimes the test would run against the "new" reader (and succeed)
or against the "old" reader (and fail.)
> Once the test was forced to run against the "old" reader, it fails deep in the Parquet
record reader in {{DrillParquetGroupConverter.getConverterForType()}}.
> The code attempts to create a Decimal9 writer by calling {{SingleMapWriter.decimal9(String
name)}} to create the writer. However, the code in this method says:
> {code}
>   public Decimal9Writer decimal9(String name) {
>     // returns existing writer
>     final FieldWriter writer = fields.get(name.toLowerCase());
>     assert writer != null;
>     return writer;
>   }
> {code}
> And, indeed, the assertion is triggered.
> As it turns out, the code for Decimal28 shows the proper solution:
> {code}
> mapWriter.decimal28Sparse(name, metadata.getScale(), metadata.getPrecision())
> {code}
> That is, pass the scale and precision to this form of the method which actually creates
the writer:
> {code}
>   public Decimal9Writer decimal9(String name, int scale, int precision) {
> {code}
> Applying the same pattern to for the Parquet Decimal9 and Decimal18 types allows the
above test to get past the asserts. Given this error, it is clear that this test could never
have run, and so the error in the Parquet reader was never detected.
> It also turns out that the test itself is wrong, reversing the validation and test queries:
> {code}
>   public void runTestAndValidate(String selection, String validationSelection, String
inputTable, String outputFile) throws Exception {
>     try {
>       deleteTableIfExists(outputFile);
>       ...
>       // Query reads from the input (JSON) table
>       String query = String.format("SELECT %s FROM %s", selection, inputTable);
>       String create = "CREATE TABLE " + outputFile + " AS " + query;
>       // validate query reads from the output (Parquet) table
>       String validateQuery = String.format("SELECT %s FROM " + outputFile, validationSelection);
>       test(create);
>       testBuilder()
>           .unOrdered()
>           .sqlQuery(query) // Query under test is input query
>           .sqlBaselineQuery(validateQuery) // Baseline query is output query
>           .go();
> {code}
> Given this, it is the Parquet data that is wrong, not the baseline.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message