carbondata-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Manish Gupta (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CARBONDATA-542) Parsing values for measures and dimensions during data load should adopt a strict check
Date Mon, 19 Dec 2016 06:46:58 GMT
Manish Gupta created CARBONDATA-542:
---------------------------------------

             Summary: Parsing values for measures and dimensions during data load should adopt
a strict check
                 Key: CARBONDATA-542
                 URL: https://issues.apache.org/jira/browse/CARBONDATA-542
             Project: CarbonData
          Issue Type: Improvement
            Reporter: Manish Gupta
            Assignee: Manish Gupta
            Priority: Minor
             Fix For: 1.0.0-incubating


Currently in carbon we treat Short and Int as long and at the time of storing in carbon data
files delta compression is used which compresses the data based on min and max values of the
column.

While parsing the values for these datatypes, we use Double data type parser and extract long
value from that. Code snippet as below. Double.valueOf(msrValue).longValue()

This has the following problems.

1. Measure Values beyond the range of Int and Short are parsed successfully. This behavior
conflicts when the same measure is included as dictionary_include and becomes a dimension.
When we query then each dimension value is parsed for its datatype for result conversion and
at that time NumberFormatException is thrown and null is displayed in the result while for
measure the loaded values are displayed. This also impacts aggregate queries. That is why
strict check mechanism is adopted for dimensions values parsing.

2. Data inconsistency  in case of measures as for decimal values, the value before decimal
will only be considered for Int and Short datatypes.

3. For measures, if values beyond the datatype range are allowed the compression will decrease.

Therefore we will have to adopt a strict behavior for both dimensions and measures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message