carbondata-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Manish Gupta (JIRA)" <>
Subject [jira] [Created] (CARBONDATA-542) Parsing values for measures and dimensions during data load should adopt a strict check
Date Mon, 19 Dec 2016 06:46:58 GMT
Manish Gupta created CARBONDATA-542:

             Summary: Parsing values for measures and dimensions during data load should adopt
a strict check
                 Key: CARBONDATA-542
             Project: CarbonData
          Issue Type: Improvement
            Reporter: Manish Gupta
            Assignee: Manish Gupta
            Priority: Minor
             Fix For: 1.0.0-incubating

Currently in carbon we treat Short and Int as long and at the time of storing in carbon data
files delta compression is used which compresses the data based on min and max values of the

While parsing the values for these datatypes, we use Double data type parser and extract long
value from that. Code snippet as below. Double.valueOf(msrValue).longValue()

This has the following problems.

1. Measure Values beyond the range of Int and Short are parsed successfully. This behavior
conflicts when the same measure is included as dictionary_include and becomes a dimension.
When we query then each dimension value is parsed for its datatype for result conversion and
at that time NumberFormatException is thrown and null is displayed in the result while for
measure the loaded values are displayed. This also impacts aggregate queries. That is why
strict check mechanism is adopted for dimensions values parsing.

2. Data inconsistency  in case of measures as for decimal values, the value before decimal
will only be considered for Int and Short datatypes.

3. For measures, if values beyond the datatype range are allowed the compression will decrease.

Therefore we will have to adopt a strict behavior for both dimensions and measures.

This message was sent by Atlassian JIRA

View raw message