apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Deepak Narkhede (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (APEXMALHAR-2312) NullPointerException in FileSplitterInput only if the file path is specified for attribute <files> instead of directory path
Date Fri, 21 Oct 2016 06:35:58 GMT

    [ https://issues.apache.org/jira/browse/APEXMALHAR-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15594249#comment-15594249
] 

Deepak Narkhede commented on APEXMALHAR-2312:
---------------------------------------------

Issue reproduction with instrumentation logs:
============================================

2016-10-21 10:35:35,227 DEBUG com.datatorrent.lib.io.fs.FileSplitterInput:  isIterationCompleted
Directory: null File: /user/deepak/CustomerTxnData
2016-10-21 10:35:35,228 ERROR com.datatorrent.lib.io.fs.FileSplitterInput: service
java.lang.NullPointerException
        at java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333)
        at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988)
        at java.util.Collections$UnmodifiableMap.get(Collections.java:1339)
        at com.datatorrent.lib.io.fs.FileSplitterInput$TimeBasedDirectoryScanner.isIterationCompleted(FileSplitterInput.java:402)
        at com.datatorrent.lib.io.fs.FileSplitterInput$TimeBasedDirectoryScanner.run(FileSplitterInput.java:358)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
2016-10-21 10:35:35,231 ERROR com.datatorrent.stram.engine.StreamingContainer: Operator set
[OperatorDeployInfo[id=1,name=recordReader$FileSplitter,type=INPUT,checkpoint={ffffffffffffffff,
0, 0},inputs=[],outputs=[OperatorDeployInfo.OutputDeployInfo[portName=blocksMetadataOutput,streamId=recordReader$BlockMetadata,bufferServer=deepak-HP-ProBook-650-G2]]]]
stopped running due to an exception.
java.lang.NullPointerException
        at java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333)
        at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988)
        at java.util.Collections$UnmodifiableMap.get(Collections.java:1339)
        at com.datatorrent.lib.io.fs.FileSplitterInput$TimeBasedDirectoryScanner.isIterationCompleted(FileSplitterInput.java:402)
        at com.datatorrent.lib.io.fs.FileSplitterInput$TimeBasedDirectoryScanner.run(FileSplitterInput.java:358)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)


Get methods implementations of ConcurrentHashMap and HashMap:
======================================================

ConcurrentHashMap<> get():
-------------------------
...
     *
     * @throws NullPointerException if the specified key is null
     */
    public V get(Object key) {        Segment<K,V> s; // manually integrate access methods
to reduce overhead
        HashEntry<K,V>[] tab;
        int h = hash(key);
...

HashMap<> get():
---------------

 public V get(Object key) {
        if (key == null)
            return getForNullKey();
        Entry<K,V> entry = getEntry(key);
...


Testing logs with fix for files/directories/sub-directories:
==========================================

2016-10-21 11:20:38,382 DEBUG com.datatorrent.lib.io.fs.FileSplitterInput: Directory path:
/user/deepak/files Sub-Directory or File path: /user/deepak/files/CustomerTxnData2
2016-10-21 11:20:38,382 DEBUG com.datatorrent.lib.io.fs.FileSplitterInput: Scan started for
input /user/deepak/files
2016-10-21 11:20:38,386 DEBUG com.datatorrent.lib.io.fs.FileSplitterInput: scan /user/deepak/files
2016-10-21 11:20:33,372 DEBUG com.datatorrent.lib.io.fs.FileSplitterInput: discovered /user/deepak/files/CustomerTxnData
1477028632605
2016-10-21 11:20:33,372 DEBUG com.datatorrent.lib.io.fs.FileSplitterInput: discovered /user/deepak/files/CustomerTxnData1
1477028642067
2016-10-21 11:20:33,373 DEBUG com.datatorrent.lib.io.fs.FileSplitterInput: discovered /user/deepak/files/CustomerTxnData2
1477028645290
2016-10-21 11:20:33,373 DEBUG com.datatorrent.lib.io.fs.FileSplitterInput: scan complete 0
3

....

2016-10-21 11:25:50,697 DEBUG com.datatorrent.lib.io.fs.FileSplitterInput: Directory path:
null Sub-Directory or File path: /user/deepak/files/CustomerTxnData
2016-10-21 11:25:50,697 DEBUG com.datatorrent.lib.io.fs.FileSplitterInput: Scan started for
input /user/deepak/files/CustomerTxnData
2016-10-21 11:25:50,702 DEBUG com.datatorrent.lib.io.fs.FileSplitterInput: scan /user/deepak/files/CustomerTxnData
2016-10-21 11:25:50,704 DEBUG com.datatorrent.lib.io.fs.FileSplitterInput: scan complete


> NullPointerException in FileSplitterInput only if the file path is specified for attribute
<files> instead of directory path
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: APEXMALHAR-2312
>                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2312
>             Project: Apache Apex Malhar
>          Issue Type: Bug
>            Reporter: Deepak Narkhede
>            Assignee: Deepak Narkhede
>            Priority: Minor
>
> Problem Statement:
> ==================
> NullPointerException seen in FileSplitterInput only if the file path is specified for
attribute <files> instead of directory path.
> Description:
> ===========
> 1) TimeBasedDirectoryScanner threads part of scanservice tries to scan the directories/files.
> 2) Each thread checks with help of isIterationCompleted() [referenceTimes] method whether
scanned of last iteration are processed by operator thread.
> 3) Previously it used to work because HashMap (referenceTimes) used to return null even
if last scanned directory path is null.
> 4) Recently referenceTimes is changed to ConcurrentHashMap, so get() doesn't allow null
key's passed to ConcurrentHashMap get() method.
> 5) Hence NullPointerException is seen as if only file path is provided directory path
would be empty hence key would be empty.
> Solution:
> ========
> Pre-check that directory path is null then we have completed last iterations if only
filepath is provided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message