apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From deepak-narkhede <...@git.apache.org>
Subject [GitHub] apex-malhar pull request #463: APEXMALHAR-2312 Fix NullPointerException for ...
Date Mon, 24 Oct 2016 04:23:33 GMT
GitHub user deepak-narkhede reopened a pull request:

    https://github.com/apache/apex-malhar/pull/463

    APEXMALHAR-2312 Fix NullPointerException for FileSplitterInput Operat…

    Problem Statement:
    -------------------------
    NullPointerException seen in FileSplitterInput only if the file path is specified for
attribute <files> instead of directory path.
    
    Description:
    ---------------
    1) TimeBasedDirectoryScanner threads part of scanservice tries to scan the directories/files.
    2) Each thread checks with help of isIterationCompleted() [referenceTimes] method whether
scanned of last iteration are processed by operator thread.
    3) Previously it used to work because HashMap (referenceTimes) used to return null even
if last scanned directory path is null.
    4) Recently referenceTimes is changed to ConcurrentHashMap, so get() doesn't allow null
key's passed to ConcurrentHashMap get() method.
    5) Hence NullPointerException is seen as if only file path is provided directory path
would be empty hence key would be empty.
    
    Solution:
    -----------
    Pre-check that directory path is null then we have completed last iterations if only filepath
is provided.
    
    Testing logs with fix for files/directories/sub-directories:
    ---------------------------------------------------------------------
    2016-10-21 11:20:38,382 DEBUG com.datatorrent.lib.io.fs.FileSplitterInput: Directory path:
/user/deepak/files Sub-Directory or File path: /user/deepak/files/CustomerTxnData2
    2016-10-21 11:20:38,382 DEBUG com.datatorrent.lib.io.fs.FileSplitterInput: Scan started
for input /user/deepak/files
    2016-10-21 11:20:38,386 DEBUG com.datatorrent.lib.io.fs.FileSplitterInput: scan /user/deepak/files
    2016-10-21 11:20:33,372 DEBUG com.datatorrent.lib.io.fs.FileSplitterInput: discovered
/user/deepak/files/CustomerTxnData 1477028632605
    2016-10-21 11:20:33,372 DEBUG com.datatorrent.lib.io.fs.FileSplitterInput: discovered
/user/deepak/files/CustomerTxnData1 1477028642067
    2016-10-21 11:20:33,373 DEBUG com.datatorrent.lib.io.fs.FileSplitterInput: discovered
/user/deepak/files/CustomerTxnData2 1477028645290
    2016-10-21 11:20:33,373 DEBUG com.datatorrent.lib.io.fs.FileSplitterInput: scan complete
0 3
    
    ....
    
    2016-10-21 11:25:50,697 DEBUG com.datatorrent.lib.io.fs.FileSplitterInput: Directory path:
null Sub-Directory or File path: /user/deepak/files/CustomerTxnData
    2016-10-21 11:25:50,697 DEBUG com.datatorrent.lib.io.fs.FileSplitterInput: Scan started
for input /user/deepak/files/CustomerTxnData
    2016-10-21 11:25:50,702 DEBUG com.datatorrent.lib.io.fs.FileSplitterInput: scan /user/deepak/files/CustomerTxnData
    2016-10-21 11:25:50,704 DEBUG com.datatorrent.lib.io.fs.FileSplitterInput: scan complete
                                                                                         
                                                         

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/deepak-narkhede/apex-malhar APEXMALHAR-2312

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/apex-malhar/pull/463.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #463
    
----
commit 47f29f39393a4e43c8423153d32d12c9622872b5
Author: deepak-narkhede <mailtodeepakn@gmail.com>
Date:   2016-10-21T06:44:34Z

    APEXMALHAR-2312 Fix NullPointerException for FileSplitterInput Operator if filepath is
specified.
    
    Problem Description:
    -------------------
    1) TimeBasedDirectoryScanner threads part of scanservice tries to scan the directories/files.
    2) Each thread checks with help of isIterationCompleted() [referenceTimes] method whether
scanned of last iteration are processed by operator thread.
    3) Previously it used to work because HashMap (referenceTimes) used to return null even
if last scanned directory path is null.
    4) Recently referenceTimes is changed to ConcurrentHashMap, so get() doesn't allow null
key's passed to ConcurrentHashMap get() method.
    5) Hence NullPointerException is seen as if only file path is provided directory path
would be empty hence key would be empty.
    
    Solution:
    ---------
    Pre-check that directory path is null then we have completed last iterations if only filepath
is provided.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message