apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (APEXMALHAR-2309) TimeBasedDedupOperator marks new tuples as duplicates if expired tuples exist
Date Mon, 24 Oct 2016 06:29:58 GMT

    [ https://issues.apache.org/jira/browse/APEXMALHAR-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15601113#comment-15601113
] 

ASF GitHub Bot commented on APEXMALHAR-2309:
--------------------------------------------

GitHub user francisf reopened a pull request:

    https://github.com/apache/apex-malhar/pull/464

    APEXMALHAR-2309 Comparing times for newer tuples with existing key

    @bhupeshchawda please review.
    Marking a tuple as unique if the time found for the key in asyncEvents is < current
tuple's time

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/francisf/apex-malhar APEXMALHAR-2309_Deduper_valid_as_duplicates

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/apex-malhar/pull/464.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #464
    
----
commit c56e5c36c46f90fb0fee7cb6558bf860dbf6e181
Author: francisf <francis.fsfsfs@gmail.com>
Date:   2016-10-21T13:08:39Z

    APEXMALHAR-2309 Comparing times for newer tuples with existing key

----


> TimeBasedDedupOperator marks new tuples as duplicates if expired tuples exist
> -----------------------------------------------------------------------------
>
>                 Key: APEXMALHAR-2309
>                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2309
>             Project: Apache Apex Malhar
>          Issue Type: Bug
>    Affects Versions: 3.5.0
>            Reporter: Francis Fernandes
>            Assignee: Francis Fernandes
>
> The deduper marks valid tuples outside the expiry window as duplicates. 
> Consider the following configuration (number of buckets = 1 )
> {code}
>   <property>
>     <name>dt.application.DedupTestApp.operator.Deduper.prop.expireBefore</name>
>     <value>10</value>
>   </property>
>   <property>
>     <name>dt.application.DedupTestApp.operator.Deduper.prop.bucketSpan</name>
>     <value>10</value>
>   </property>
> {code}
> The data piped in is : 
> {code}
> "10",1474614305000,"Test"
> "11",1474614315000,"Test"
> "10",1474614325000,"Test"
> {code}
> The 3rd tuple is valid since it is outside of the expiry window. But it is marked as
duplicate because although the first tuple although expired is still present in the Bucket.flash.
> The issue happens when the expiry duration lesser than the checkpointing duration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message