accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-3967) bulk import loses records when loading pre-split table
Date Sun, 23 Aug 2015 05:07:46 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-3967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14708303#comment-14708303
] 

Josh Elser commented on ACCUMULO-3967:
--------------------------------------

The problem appears to be in the {{BulkImporter.findOverlappingTablets(..)}} call on the failed
extent. 

{noformat}
2015-08-23 00:42:16,228 [client.BulkImporter] DEBUG: Trying to assign 1 map files that previously
failed on some key extents
2015-08-23 00:42:16,248 [client.BulkImporter] INFO : Overlapping tablets for 4;13;12: [(4;14;13,localhost:65198,14f579bbedc003f)]
2015-08-23 00:42:16,259 [client.BulkImporter] INFO : Overlapping tablets for 4;01<:   [(4;01<,localhost:65198,14f579bbedc003f)]
2015-08-23 00:42:16,274 [client.BulkImporter] INFO : Overlapping tablets for 4;14;13: [(4;15;14,localhost:65075,14f579bbedc003b)]
2015-08-23 00:42:16,291 [client.BulkImporter] INFO : Overlapping tablets for 4;02;01: [(4;03;02,localhost:65198,14f579bbedc003f)]
2015-08-23 00:42:16,307 [client.BulkImporter] INFO : Overlapping tablets for 4;10;09: [(4;11;10,localhost:65075,14f579bbedc003b)]
2015-08-23 00:42:16,325 [client.BulkImporter] INFO : Overlapping tablets for 4;16;15: [(4;17;16,localhost:65075,14f579bbedc003b)]
2015-08-23 00:42:16,340 [client.BulkImporter] INFO : Overlapping tablets for 4;22;21: [(4;23;22,localhost:65198,14f579bbedc003f)]
2015-08-23 00:42:16,355 [client.BulkImporter] INFO : Overlapping tablets for 4;07;06: [(4;08;07,localhost:65075,14f579bbedc003b)]
2015-08-23 00:42:16,367 [client.BulkImporter] INFO : Overlapping tablets for 4;15;14: [(4;16;15,localhost:65198,14f579bbedc003f)]
2015-08-23 00:42:16,378 [client.BulkImporter] INFO : Overlapping tablets for 4;05;04: [(4;06;05,localhost:65075,14f579bbedc003b)]
2015-08-23 00:42:16,389 [client.BulkImporter] INFO : Overlapping tablets for 4;04;03: [(4;05;04,localhost:65075,14f579bbedc003b)]
2015-08-23 00:42:16,400 [client.BulkImporter] INFO : Overlapping tablets for 4;06;05: [(4;07;06,localhost:65198,14f579bbedc003f)]
2015-08-23 00:42:16,412 [client.BulkImporter] INFO : Overlapping tablets for 4;11;10: [(4;12;11,localhost:65075,14f579bbedc003b)]
2015-08-23 00:42:16,424 [client.BulkImporter] INFO : Overlapping tablets for 4;08;07: [(4;09;08,localhost:65075,14f579bbedc003b)]
2015-08-23 00:42:16,436 [client.BulkImporter] INFO : Overlapping tablets for 4;12;11: [(4;13;12,localhost:65198,14f579bbedc003f)]
2015-08-23 00:42:16,448 [client.BulkImporter] INFO : Overlapping tablets for 4;09;08: [(4;10;09,localhost:65198,14f579bbedc003f)]
{noformat}

Note how for all but the first tablet, the immediately following tablet is returned instead
of the original. This results in some tablets be missed and some tablets be reimported into
(thankfully, the tabletservers ignore the second request to reimport the same file -- that's,
thankfully, working as intended).

Here's a diff of some extra logging I put in:

{code:java}
diff --git a/server/base/src/main/java/org/apache/accumulo/server/client/BulkImporter.java
b/server/base/src/main/java/org/apache/accumulo/server/client/BulkImporter.java
index 283d304..606e9ef 100644
--- a/server/base/src/main/java/org/apache/accumulo/server/client/BulkImporter.java
+++ b/server/base/src/main/java/org/apache/accumulo/server/client/BulkImporter.java
@@ -171,6 +171,8 @@ public class BulkImporter {
       Map<Path,List<KeyExtent>> assignmentFailures = assignMapFiles(context,
conf, fs, tableId, assignments, paths, numAssignThreads, numThreads);
       assignmentStats.assignmentsFailed(assignmentFailures);

+      log.info("Initial set of failures: " + assignmentFailures);
+
       Map<Path,Integer> failureCount = new TreeMap<Path,Integer>();

       for (Entry<Path,List<KeyExtent>> entry : assignmentFailures.entrySet())
@@ -205,7 +207,9 @@ public class BulkImporter {

             try {
               timer.start(Timers.QUERY_METADATA);
-              tabletsToAssignMapFileTo.addAll(findOverlappingTablets(context, fs, locator,
entry.getKey(), ke));
+              List<TabletLocation> overlappingTablets = findOverlappingTablets(context,
fs, locator, entry.getKey(), ke);
+              log.info("Overlapping tablets for " + ke + ": " + overlappingTablets);
+              tabletsToAssignMapFileTo.addAll(overlappingTablets);
               timer.stop(Timers.QUERY_METADATA);
               keListIter.remove();
             } catch (Exception ex) {
@@ -423,10 +427,12 @@ public class BulkImporter {

   private Map<Path,List<KeyExtent>> assignMapFiles(ClientContext context, Configuration
conf, VolumeManager fs, String tableId,
       Map<Path,List<TabletLocation>> assignments, Collection<Path> paths,
int numThreads, int numMapThreads) {
+    log.info("Currently assigning: " + assignments);
     timer.start(Timers.EXAMINE_MAP_FILES);
     Map<Path,List<AssignmentInfo>> assignInfo = estimateSizes(context.getConfiguration(),
conf, fs, assignments, paths, numMapThreads);
     timer.stop(Timers.EXAMINE_MAP_FILES);

+    log.info("Estimated sizes: " + assignInfo);
     Map<Path,List<KeyExtent>> ret;

     timer.start(Timers.IMPORT_MAP_FILES);
@@ -521,10 +527,15 @@ public class BulkImporter {
       }
     }

+    log.info("Assignments grouped by tablet: " + assignmentsPerTablet);
+
     // group assignments by tabletserver

     Map<Path,List<KeyExtent>> assignmentFailures = Collections.synchronizedMap(new
TreeMap<Path,List<KeyExtent>>());

+    log.info("All locations for imports : " + locations);
+
+    // server -> {extent -> files, extent -> files, ...}, server -> { ... },
...
     TreeMap<String,Map<KeyExtent,List<PathSize>>> assignmentsPerTabletServer
= new TreeMap<String,Map<KeyExtent,List<PathSize>>>();

     for (Entry<KeyExtent,List<PathSize>> entry : assignmentsPerTablet.entrySet())
{
{code}

> bulk import loses records when loading pre-split table
> ------------------------------------------------------
>
>                 Key: ACCUMULO-3967
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3967
>             Project: Accumulo
>          Issue Type: Bug
>          Components: client, tserver
>    Affects Versions: 1.7.0
>         Environment: generic hadoop 2.6.0, zookeeper 3.4.6 on redhat 6.7
> 7 node cluster
>            Reporter: Edward Seidl
>            Priority: Blocker
>             Fix For: 1.7.1, 1.8.0
>
>
> I just noticed that some records I'm loading via importDirectory go missing.  After a
lot of digging around trying to reproduce the problem, I discovered that it occurs most frequently
when loading a table that I have just recently added splits to.  In the tserver logs I'll
see messages like 
> 20 16:25:36,805 [client.BulkImporter] INFO : Could not assign 1 map files to tablet 1xw;18;17
because : Not Serving Tablet .  Will retry ...
>  
> or
> 20 16:25:44,826 [tserver.TabletServer] INFO : files [hdfs://xxxx:54310/accumulo/tables/1xw/b-00jnmxe/I00jnmxq.rf]
not imported to 1xw;03;02: tablet 1xw;03;02 is closed
> these appear after messages about unloading tablets...it seems that tablets are being
redistributed at the same time as the bulk import is occuring.
> Steps to reproduce
> 1) I run a mapreduce job that produces random data in rfiles
> 2) copy the rfiles to an import directory
> 3) create table or deleterows -f
> 4) addsplits
> 5) importdirectory
> I have also performed the above completely within the mapreduce job, with similar results.
 The difference with the mapreduce job is that the time between adding splits and the import
directory is minutes rather than seconds.
> my current test creates 1000000 records, and after the importdirectory returns a count
of rows will be anywhere from ~800000 to 1000000.
> With my original workflow, I found that re-importing the same set of rfiles three times
would eventually get all rows loaded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message