carbondata-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "xuchuanyin (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CARBONDATA-1839) Data load failed when using compressed sort temp file
Date Thu, 30 Nov 2017 03:57:00 GMT

     [ https://issues.apache.org/jira/browse/CARBONDATA-1839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

xuchuanyin updated CARBONDATA-1839:
-----------------------------------
    Description: 
Carbondata provide an option to optimize data load process by compressing the intermediate
sort temp files.

The option is `carbon.is.sort.temp.file.compression.enabled` and its default value is `false`.
In some disk tense scenario, user can turn on this feature by setting the option `true`, it
will compress the file content before writing it to disk.

How ever I have found bugs in the related code and the data load was failed after turning
on this feature.

Error messages are shown as below:

```
17/11/29 18:04:12 ERROR SortDataRows: SortDataRowPool:test1 
java.lang.ClassCastException: [B cannot be cast to [Ljava.lang.Integer;
    at org.apache.carbondata.core.util.NonDictionaryUtil.getDimension(NonDictionaryUtil.java:93)
    at org.apache.carbondata.processing.sort.sortdata.UnCompressedTempSortFileWriter.writeDataOutputStream(UnCompressedTempSortFileWriter.java:52)
    at org.apache.carbondata.processing.sort.sortdata.CompressedTempSortFileWriter.writeSortTempFile(CompressedTempSortFileWriter.java:65)
    at org.apache.carbondata.processing.sort.sortdata.SortTempFileChunkWriter.writeSortTempFile(SortTempFileChunkWriter.java:72)
    at org.apache.carbondata.processing.sort.sortdata.SortDataRows.writeSortTempFile(SortDataRows.java:245)
    at org.apache.carbondata.processing.sort.sortdata.SortDataRows.writeDataTofile(SortDataRows.java:232)
    at org.apache.carbondata.processing.sort.sortdata.SortDataRows.access$300(SortDataRows.java:45)
    at org.apache.carbondata.processing.sort.sortdata.SortDataRows$DataSorterAndWriter.run(SortDataRows.java:426)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
```
```
17/11/29 18:04:13 ERROR SortDataRows: SafeParallelSorterPool:test1 exception occurred while
trying to acquire a semaphore lock: Task org.apache.carbondata.processing.sort.sortdata.SortDataRows$DataSorterAndWriter@3d413b40
rejected from java.util.concurrent.ThreadPoolExecutor@cb56011[Terminated, pool size = 0, active
threads = 0, queued tasks = 0, completed tasks = 1]
17/11/29 18:04:13 ERROR ParallelReadMergeSorterImpl: SafeParallelSorterPool:test1 
org.apache.carbondata.processing.sort.exception.CarbonSortKeyAndGroupByException: 
    at org.apache.carbondata.processing.sort.sortdata.SortDataRows.addRowBatch(SortDataRows.java:173)
    at org.apache.carbondata.processing.loading.sort.impl.ParallelReadMergeSorterImpl$SortIteratorThread.run(ParallelReadMergeSorterImpl.java:227)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.RejectedExecutionException: Task org.apache.carbondata.processing.sort.sortdata.SortDataRows$DataSorterAndWriter@3d413b40
rejected from java.util.concurrent.ThreadPoolExecutor@cb56011[Terminated, pool size = 0, active
threads = 0, queued tasks = 0, completed tasks = 1]
    at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
    at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
    at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
    at org.apache.carbondata.processing.sort.sortdata.SortDataRows.addRowBatch(SortDataRows.java:169)
    ... 4 more
```
```
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.carbondata.processing.sort.exception.CarbonSortKeyAndGroupByException:

    at org.apache.carbondata.processing.sort.sortdata.SortDataRows.addRowBatch(SortDataRows.java:173)
    at org.apache.carbondata.processing.loading.sort.impl.ParallelReadMergeSorterImpl$SortIteratorThread.run(ParallelReadMergeSorterImpl.java:227)
    ... 3 more
Caused by: java.util.concurrent.RejectedExecutionException: Task org.apache.carbondata.processing.sort.sortdata.SortDataRows$DataSorterAndWriter@3d413b40
rejected from java.util.concurrent.ThreadPoolExecutor@cb56011[Terminated, pool size = 0, active
threads = 0, queued tasks = 0, completed tasks = 1]
    at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
    at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
    at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
    at org.apache.carbondata.processing.sort.sortdata.SortDataRows.addRowBatch(SortDataRows.java:169)
    ... 4 more
```

  was:
Carbondata provide an option to optimize data load process by compressing the intermediate
sort temp files.

The option is `carbon.is.sort.temp.file.compression.enabled` and its default value is `false`.
In some disk tense scenario, user can turn on this feature by setting the option `true`, it
will compress the file content before write it to disk.

How ever I have found bugs in the related code and the data load is failed after turn on this
feature.

Error messages are shown as below:

```
17/11/29 18:04:12 ERROR SortDataRows: SortDataRowPool:test1 
java.lang.ClassCastException: [B cannot be cast to [Ljava.lang.Integer;
    at org.apache.carbondata.core.util.NonDictionaryUtil.getDimension(NonDictionaryUtil.java:93)
    at org.apache.carbondata.processing.sort.sortdata.UnCompressedTempSortFileWriter.writeDataOutputStream(UnCompressedTempSortFileWriter.java:52)
    at org.apache.carbondata.processing.sort.sortdata.CompressedTempSortFileWriter.writeSortTempFile(CompressedTempSortFileWriter.java:65)
    at org.apache.carbondata.processing.sort.sortdata.SortTempFileChunkWriter.writeSortTempFile(SortTempFileChunkWriter.java:72)
    at org.apache.carbondata.processing.sort.sortdata.SortDataRows.writeSortTempFile(SortDataRows.java:245)
    at org.apache.carbondata.processing.sort.sortdata.SortDataRows.writeDataTofile(SortDataRows.java:232)
    at org.apache.carbondata.processing.sort.sortdata.SortDataRows.access$300(SortDataRows.java:45)
    at org.apache.carbondata.processing.sort.sortdata.SortDataRows$DataSorterAndWriter.run(SortDataRows.java:426)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
```
```
17/11/29 18:04:13 ERROR SortDataRows: SafeParallelSorterPool:test1 exception occurred while
trying to acquire a semaphore lock: Task org.apache.carbondata.processing.sort.sortdata.SortDataRows$DataSorterAndWriter@3d413b40
rejected from java.util.concurrent.ThreadPoolExecutor@cb56011[Terminated, pool size = 0, active
threads = 0, queued tasks = 0, completed tasks = 1]
17/11/29 18:04:13 ERROR ParallelReadMergeSorterImpl: SafeParallelSorterPool:test1 
org.apache.carbondata.processing.sort.exception.CarbonSortKeyAndGroupByException: 
    at org.apache.carbondata.processing.sort.sortdata.SortDataRows.addRowBatch(SortDataRows.java:173)
    at org.apache.carbondata.processing.loading.sort.impl.ParallelReadMergeSorterImpl$SortIteratorThread.run(ParallelReadMergeSorterImpl.java:227)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.RejectedExecutionException: Task org.apache.carbondata.processing.sort.sortdata.SortDataRows$DataSorterAndWriter@3d413b40
rejected from java.util.concurrent.ThreadPoolExecutor@cb56011[Terminated, pool size = 0, active
threads = 0, queued tasks = 0, completed tasks = 1]
    at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
    at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
    at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
    at org.apache.carbondata.processing.sort.sortdata.SortDataRows.addRowBatch(SortDataRows.java:169)
    ... 4 more
```
```
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.carbondata.processing.sort.exception.CarbonSortKeyAndGroupByException:

    at org.apache.carbondata.processing.sort.sortdata.SortDataRows.addRowBatch(SortDataRows.java:173)
    at org.apache.carbondata.processing.loading.sort.impl.ParallelReadMergeSorterImpl$SortIteratorThread.run(ParallelReadMergeSorterImpl.java:227)
    ... 3 more
Caused by: java.util.concurrent.RejectedExecutionException: Task org.apache.carbondata.processing.sort.sortdata.SortDataRows$DataSorterAndWriter@3d413b40
rejected from java.util.concurrent.ThreadPoolExecutor@cb56011[Terminated, pool size = 0, active
threads = 0, queued tasks = 0, completed tasks = 1]
    at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
    at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
    at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
    at org.apache.carbondata.processing.sort.sortdata.SortDataRows.addRowBatch(SortDataRows.java:169)
    ... 4 more
```


> Data load failed when using compressed sort temp file
> -----------------------------------------------------
>
>                 Key: CARBONDATA-1839
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-1839
>             Project: CarbonData
>          Issue Type: Bug
>            Reporter: xuchuanyin
>            Assignee: xuchuanyin
>
> Carbondata provide an option to optimize data load process by compressing the intermediate
sort temp files.
> The option is `carbon.is.sort.temp.file.compression.enabled` and its default value is
`false`. In some disk tense scenario, user can turn on this feature by setting the option
`true`, it will compress the file content before writing it to disk.
> How ever I have found bugs in the related code and the data load was failed after turning
on this feature.
> Error messages are shown as below:
> ```
> 17/11/29 18:04:12 ERROR SortDataRows: SortDataRowPool:test1 
> java.lang.ClassCastException: [B cannot be cast to [Ljava.lang.Integer;
>     at org.apache.carbondata.core.util.NonDictionaryUtil.getDimension(NonDictionaryUtil.java:93)
>     at org.apache.carbondata.processing.sort.sortdata.UnCompressedTempSortFileWriter.writeDataOutputStream(UnCompressedTempSortFileWriter.java:52)
>     at org.apache.carbondata.processing.sort.sortdata.CompressedTempSortFileWriter.writeSortTempFile(CompressedTempSortFileWriter.java:65)
>     at org.apache.carbondata.processing.sort.sortdata.SortTempFileChunkWriter.writeSortTempFile(SortTempFileChunkWriter.java:72)
>     at org.apache.carbondata.processing.sort.sortdata.SortDataRows.writeSortTempFile(SortDataRows.java:245)
>     at org.apache.carbondata.processing.sort.sortdata.SortDataRows.writeDataTofile(SortDataRows.java:232)
>     at org.apache.carbondata.processing.sort.sortdata.SortDataRows.access$300(SortDataRows.java:45)
>     at org.apache.carbondata.processing.sort.sortdata.SortDataRows$DataSorterAndWriter.run(SortDataRows.java:426)
>     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>     at java.lang.Thread.run(Thread.java:745)
> ```
> ```
> 17/11/29 18:04:13 ERROR SortDataRows: SafeParallelSorterPool:test1 exception occurred
while trying to acquire a semaphore lock: Task org.apache.carbondata.processing.sort.sortdata.SortDataRows$DataSorterAndWriter@3d413b40
rejected from java.util.concurrent.ThreadPoolExecutor@cb56011[Terminated, pool size = 0, active
threads = 0, queued tasks = 0, completed tasks = 1]
> 17/11/29 18:04:13 ERROR ParallelReadMergeSorterImpl: SafeParallelSorterPool:test1 
> org.apache.carbondata.processing.sort.exception.CarbonSortKeyAndGroupByException: 
>     at org.apache.carbondata.processing.sort.sortdata.SortDataRows.addRowBatch(SortDataRows.java:173)
>     at org.apache.carbondata.processing.loading.sort.impl.ParallelReadMergeSorterImpl$SortIteratorThread.run(ParallelReadMergeSorterImpl.java:227)
>     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>     at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.concurrent.RejectedExecutionException: Task org.apache.carbondata.processing.sort.sortdata.SortDataRows$DataSorterAndWriter@3d413b40
rejected from java.util.concurrent.ThreadPoolExecutor@cb56011[Terminated, pool size = 0, active
threads = 0, queued tasks = 0, completed tasks = 1]
>     at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
>     at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
>     at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
>     at org.apache.carbondata.processing.sort.sortdata.SortDataRows.addRowBatch(SortDataRows.java:169)
>     ... 4 more
> ```
> ```
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>     at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.carbondata.processing.sort.exception.CarbonSortKeyAndGroupByException:

>     at org.apache.carbondata.processing.sort.sortdata.SortDataRows.addRowBatch(SortDataRows.java:173)
>     at org.apache.carbondata.processing.loading.sort.impl.ParallelReadMergeSorterImpl$SortIteratorThread.run(ParallelReadMergeSorterImpl.java:227)
>     ... 3 more
> Caused by: java.util.concurrent.RejectedExecutionException: Task org.apache.carbondata.processing.sort.sortdata.SortDataRows$DataSorterAndWriter@3d413b40
rejected from java.util.concurrent.ThreadPoolExecutor@cb56011[Terminated, pool size = 0, active
threads = 0, queued tasks = 0, completed tasks = 1]
>     at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
>     at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
>     at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
>     at org.apache.carbondata.processing.sort.sortdata.SortDataRows.addRowBatch(SortDataRows.java:169)
>     ... 4 more
> ```



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message