carbondata-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ravindra Pesala (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (CARBONDATA-542) Parsing values for measures and dimensions during data load should adopt a strict check
Date Tue, 17 Jan 2017 19:55:26 GMT

     [ https://issues.apache.org/jira/browse/CARBONDATA-542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ravindra Pesala resolved CARBONDATA-542.
----------------------------------------

> Parsing values for measures and dimensions during data load should adopt a strict check
> ---------------------------------------------------------------------------------------
>
>                 Key: CARBONDATA-542
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-542
>             Project: CarbonData
>          Issue Type: Improvement
>            Reporter: Manish Gupta
>            Priority: Minor
>             Fix For: 1.0.0-incubating
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently in carbon we treat Short and Int as long and at the time of storing in carbon
data files delta compression is used which compresses the data based on min and max values
of the column.
> While parsing the values for these datatypes, we use Double data type parser and extract
long value from that. Code snippet as below. Double.valueOf(msrValue).longValue()
> This has the following problems.
> 1. Measure Values beyond the range of Int and Short are parsed successfully. This behavior
conflicts when the same measure is included as dictionary_include and becomes a dimension.
When we query then each dimension value is parsed for its datatype for result conversion and
at that time NumberFormatException is thrown and null is displayed in the result while for
measure the loaded values are displayed. This also impacts aggregate queries. That is why
strict check mechanism is adopted for dimensions values parsing.
> 2. Data inconsistency  in case of measures as for decimal values, the value before decimal
will only be considered for Int and Short datatypes.
> 3. For measures, if values beyond the datatype range are allowed the compression will
decrease.
> Therefore we will have to adopt a strict behavior for both dimensions and measures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message