sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Abraham Fine (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SQOOP-2811) Sqoop2: Extracting sequence files may result in duplicates
Date Fri, 29 Jan 2016 22:16:39 GMT
Abraham Fine created SQOOP-2811:
-----------------------------------

             Summary: Sqoop2: Extracting sequence files may result in duplicates
                 Key: SQOOP-2811
                 URL: https://issues.apache.org/jira/browse/SQOOP-2811
             Project: Sqoop
          Issue Type: Bug
    Affects Versions: 1.99.6
            Reporter: Abraham Fine
            Assignee: Abraham Fine


In the hdfs extractor we use:
```
    if (start > filereader.getPosition()) {
      filereader.sync(start); // sync to start
    }
```

to jump to the correct point in the sequence file that we want to extract.

If the sequence file is small, multiple start points may `sync` to the same point and we could
end up extracting the same record multiple times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message