accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-3967) bulk import loses records when loading pre-split table
Date Sat, 22 Aug 2015 23:53:46 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-3967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14708229#comment-14708229
] 

Josh Elser commented on ACCUMULO-3967:
--------------------------------------

Summary of what happened across the two tservers. 3 of the import requests which had failed
were never retried. These match up with the tablets that never had a file in the metadata
table.

It seems like BulkImporter is not correctly handling re-imports when a Tablet has moved. I'll
have to dig some more into this -- this is a really bad bug (thanks so much for reporting,
Edward!).

{noformat}
# Job create a single rfile to import

2015-08-22 19:14:24,453 [client.ClientServiceHandler] DEBUG: Got request to bulk import files
to table(4): [hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf]
2015-08-22 19:14:24,516 [client.BulkImporter] DEBUG: Map file hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf
found to overlap 24 tablets
2015-08-22 19:14:24,526 [client.BulkImporter] DEBUG: Estimated map files sizes in   0.01 secs
2015-08-22 19:14:24,527 [client.BulkImporter] DEBUG: Assigning 1 map files to 12 tablets at
hw10447.local:49817

# Tabletserver asks the tabletserver to bulk load
2015-08-22 19:14:24,528 [client.BulkImporter] DEBUG: Asking hw10447.local:49817 to bulk load
{
  4;02;01={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)},
  4;03;02={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)},
  4;05;04={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)},
  4;06;05={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)},
  4;14;13={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)},
  4;15;14={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)},
  4;17;16={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)},
  4;18;17={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)},
  4;20;19={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)},
  4;21;20={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)}
  4;23;22={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)},
  4<;23={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:828182)},
}

# Successful imports

2015-08-22 19:14:24,544 [tserver.TabletServer] DEBUG: ScanSess tid 192.168.2.40:51083 !0 0
entries in 0.00 secs, nbTimes = [1 1 1.00 1] 
2015-08-22 19:14:24,548 [tablet.DatafileManager] TABLET_HIST: 4;20;19 import hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf
776421 0
2015-08-22 19:14:24,552 [tserver.TabletServer] DEBUG: ScanSess tid 192.168.2.40:51083 !0 0
entries in 0.00 secs, nbTimes = [1 1 1.00 1] 
2015-08-22 19:14:24,554 [tablet.DatafileManager] TABLET_HIST: 4;03;02 import hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf
776421 0
2015-08-22 19:14:24,557 [tserver.TabletServer] DEBUG: ScanSess tid 192.168.2.40:51083 !0 0
entries in 0.00 secs, nbTimes = [1 1 1.00 1] 
2015-08-22 19:14:24,560 [tablet.DatafileManager] TABLET_HIST: 4;14;13 import hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf
776421 0
2015-08-22 19:14:24,562 [tserver.TabletServer] DEBUG: ScanSess tid 192.168.2.40:51083 !0 0
entries in 0.00 secs, nbTimes = [0 0 0.00 1] 
2015-08-22 19:14:24,565 [tablet.DatafileManager] TABLET_HIST: 4<;23 import hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf
828182 0
2015-08-22 19:14:24,568 [tserver.TabletServer] DEBUG: ScanSess tid 192.168.2.40:51083 !0 0
entries in 0.00 secs, nbTimes = [1 1 1.00 1] 
2015-08-22 19:14:24,570 [tablet.DatafileManager] TABLET_HIST: 4;15;14 import hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf
776421 0
2015-08-22 19:14:24,573 [tserver.TabletServer] DEBUG: ScanSess tid 192.168.2.40:51083 !0 0
entries in 0.00 secs, nbTimes = [1 1 1.00 1] 
2015-08-22 19:14:24,575 [tablet.DatafileManager] TABLET_HIST: 4;05;04 import hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf
776421 0
2015-08-22 19:14:24,578 [tserver.TabletServer] DEBUG: ScanSess tid 192.168.2.40:51083 !0 0
entries in 0.00 secs, nbTimes = [1 1 1.00 1] 
2015-08-22 19:14:24,581 [tablet.DatafileManager] TABLET_HIST: 4;06;05 import hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf
776421 0
2015-08-22 19:14:24,583 [tserver.TabletServer] DEBUG: ScanSess tid 192.168.2.40:51083 !0 0
entries in 0.00 secs, nbTimes = [1 1 1.00 1] 
2015-08-22 19:14:24,586 [tablet.DatafileManager] TABLET_HIST: 4;18;17 import hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf
776421 0
2015-08-22 19:14:24,588 [tserver.TabletServer] DEBUG: ScanSess tid 192.168.2.40:51083 !0 0
entries in 0.00 secs, nbTimes = [0 0 0.00 1] 
2015-08-22 19:14:24,591 [tablet.DatafileManager] TABLET_HIST: 4;17;16 import hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf
776421 0
2015-08-22 19:14:24,593 [tserver.TabletServer] DEBUG: ScanSess tid 192.168.2.40:51083 !0 0
entries in 0.00 secs, nbTimes = [0 0 0.00 1] 
2015-08-22 19:14:24,596 [tablet.DatafileManager] TABLET_HIST: 4;21;20 import hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf
776421 0

# These are the two tablets which failed to get loaded
2015-08-22 19:14:24,598 [client.BulkImporter] INFO : Could not assign 1 map files to tablet
4;02;01 because : Not Serving Tablet .  Will retry ...
2015-08-22 19:14:24,598 [client.BulkImporter] INFO : Could not assign 1 map files to tablet
4;23;22 because : Not Serving Tablet .  Will retry ...

# Tries again, original two failures are not included

2015-08-22 19:14:24,598 [client.BulkImporter] DEBUG: Assigning 1 map files to 12 tablets at
localhost:64680
2015-08-22 19:14:24,599 [client.BulkImporter] DEBUG: Asking localhost:64680 to bulk load
{
  4;01<={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:724659)},
  4;04;03={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)},
  4;07;06={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)},
  4;08;07={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)},
  4;09;08={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)},
  4;10;09={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)},
  4;11;10={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)},
  4;12;11={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)},
  4;13;12={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)},
  4;16;15={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)},
  4;19;18={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)}
  4;22;21={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)},
}

# Again, most succeed, but not all

2015-08-22 19:14:24,602 [tserver.TabletServer] DEBUG: ScanSess tid 192.168.2.40:51085 !0 0
entries in 0.00 secs, nbTimes = [1 1 1.00 1] 
2015-08-22 19:14:24,605 [tablet.DatafileManager] TABLET_HIST: 4;13;12 import hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf
776421 0
2015-08-22 19:14:24,607 [tserver.TabletServer] DEBUG: ScanSess tid 192.168.2.40:51085 !0 0
entries in 0.00 secs, nbTimes = [0 0 0.00 1] 
2015-08-22 19:14:24,610 [tablet.DatafileManager] TABLET_HIST: 4;01< import hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf
724659 0
2015-08-22 19:14:24,612 [tserver.TabletServer] DEBUG: ScanSess tid 192.168.2.40:51085 !0 0
entries in 0.00 secs, nbTimes = [0 0 0.00 1] 
2015-08-22 19:14:24,616 [tablet.DatafileManager] TABLET_HIST: 4;10;09 import hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf
776421 0
2015-08-22 19:14:24,618 [tserver.TabletServer] DEBUG: ScanSess tid 192.168.2.40:51085 !0 0
entries in 0.00 secs, nbTimes = [0 0 0.00 1] 
2015-08-22 19:14:24,621 [tablet.DatafileManager] TABLET_HIST: 4;16;15 import hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf
776421 0
2015-08-22 19:14:24,624 [tserver.TabletServer] DEBUG: ScanSess tid 192.168.2.40:51085 !0 0
entries in 0.00 secs, nbTimes = [1 1 1.00 1] 
2015-08-22 19:14:24,627 [tablet.DatafileManager] TABLET_HIST: 4;04;03 import hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf
776421 0
2015-08-22 19:14:24,629 [tserver.TabletServer] DEBUG: ScanSess tid 192.168.2.40:51085 !0 0
entries in 0.00 secs, nbTimes = [0 0 0.00 1] 
2015-08-22 19:14:24,633 [tablet.DatafileManager] TABLET_HIST: 4;22;21 import hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf
776421 0
2015-08-22 19:14:24,635 [tserver.TabletServer] DEBUG: ScanSess tid 192.168.2.40:51085 !0 0
entries in 0.00 secs, nbTimes = [1 1 1.00 1] 
2015-08-22 19:14:24,638 [tablet.DatafileManager] TABLET_HIST: 4;07;06 import hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf
776421 0
2015-08-22 19:14:24,640 [tserver.TabletServer] DEBUG: ScanSess tid 192.168.2.40:51085 !0 0
entries in 0.00 secs, nbTimes = [0 0 0.00 1] 
2015-08-22 19:14:24,644 [tablet.DatafileManager] TABLET_HIST: 4;08;07 import hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf
776421 0
2015-08-22 19:14:24,646 [tserver.TabletServer] DEBUG: ScanSess tid 192.168.2.40:51085 !0 0
entries in 0.00 secs, nbTimes = [0 0 0.00 1] 
2015-08-22 19:14:24,649 [tablet.DatafileManager] TABLET_HIST: 4;09;08 import hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf
776421 0
2015-08-22 19:14:24,652 [tserver.TabletServer] DEBUG: ScanSess tid 192.168.2.40:51085 !0 0
entries in 0.00 secs, nbTimes = [0 0 0.00 1] 
2015-08-22 19:14:24,655 [tablet.DatafileManager] TABLET_HIST: 4;19;18 import hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf
776421 0

# Another two tablets failed to import. Total of 4 failures need to be retried
2015-08-22 19:14:24,655 [client.BulkImporter] INFO : Could not assign 1 map files to tablet
4;11;10 because : Not Serving Tablet .  Will retry ...
2015-08-22 19:14:24,656 [client.BulkImporter] INFO : Could not assign 1 map files to tablet
4;12;11 because : Not Serving Tablet .  Will retry ...

# Retrying the failure

2015-08-22 19:14:28,660 [client.BulkImporter] DEBUG: Trying to assign 1 map files that previously
failed on some key extents
2015-08-22 19:14:28,728 [client.BulkImporter] DEBUG: Estimated map files sizes in   0.01 secs
2015-08-22 19:14:28,729 [client.BulkImporter] DEBUG: Assigning 1 map files to 3 tablets at
hw10447.local:49817

# Retrying 3 of the import requests. 3 of the previous 4 failures are not included here, 1
of the 4 is. The other 2 were
# successfully imported the first time.
2015-08-22 19:14:28,729 [client.BulkImporter] DEBUG: Asking hw10447.local:49817 to bulk load
{
  4;03;02={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)},
  4<;23={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:828182)},
  4;12;11={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)}
}

# Confirming to us that two of the files we tried to reimport were already done

2015-08-22 19:14:28,733 [tablet.DatafileManager] DEBUG: Ignoring request to re-import a file
already imported: 4;03;02: hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf
2015-08-22 19:14:28,736 [tablet.DatafileManager] DEBUG: Ignoring request to re-import a file
already imported: 4<;23: hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf

# We did get one more imported which had previously failed 

2015-08-22 19:14:28,741 [tablet.DatafileManager] TABLET_HIST: 4;12;11 import hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf
776421 0

# Another cycle of failure retrying, again, trying to reimport one that didn't fail the first
time

2015-08-22 19:14:28,741 [client.BulkImporter] DEBUG: Assigning 1 map files to 1 tablets at
localhost:64680
2015-08-22 19:14:28,742 [client.BulkImporter] DEBUG: Asking localhost:64680 to bulk load
{
  4;13;12={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)}
}

# Again, telling us that.

2015-08-22 19:14:28,745 [tablet.DatafileManager] DEBUG: Ignoring request to re-import a file
already imported: 4;13;12: hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf

# Final summary

2015-08-22 19:14:28,746 [client.BulkImporter] DEBUG: BULK IMPORT TIMING STATISTICS
2015-08-22 19:14:28,746 [client.BulkImporter] DEBUG: Files: [I000005i.rf]
2015-08-22 19:14:28,746 [client.BulkImporter] DEBUG: Examine map files    :       0.08 secs
  1.84%
2015-08-22 19:14:28,746 [client.BulkImporter] DEBUG: Query accumulo.metadata :       0.06
secs   1.40%
2015-08-22 19:14:28,746 [client.BulkImporter] DEBUG: Import Map Files     :       0.15 secs
  3.42%
2015-08-22 19:14:28,746 [client.BulkImporter] DEBUG: Sleep                :       4.00 secs
 93.29%
2015-08-22 19:14:28,746 [client.BulkImporter] DEBUG: Misc                 :       0.00 secs
  0.05%
2015-08-22 19:14:28,746 [client.BulkImporter] DEBUG: Total                :       4.29 secs
{noformat}

> bulk import loses records when loading pre-split table
> ------------------------------------------------------
>
>                 Key: ACCUMULO-3967
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3967
>             Project: Accumulo
>          Issue Type: Bug
>          Components: client, tserver
>    Affects Versions: 1.7.0
>         Environment: generic hadoop 2.6.0, zookeeper 3.4.6 on redhat 6.7
> 7 node cluster
>            Reporter: Edward Seidl
>            Priority: Blocker
>             Fix For: 1.7.1, 1.8.0
>
>
> I just noticed that some records I'm loading via importDirectory go missing.  After a
lot of digging around trying to reproduce the problem, I discovered that it occurs most frequently
when loading a table that I have just recently added splits to.  In the tserver logs I'll
see messages like 
> 20 16:25:36,805 [client.BulkImporter] INFO : Could not assign 1 map files to tablet 1xw;18;17
because : Not Serving Tablet .  Will retry ...
>  
> or
> 20 16:25:44,826 [tserver.TabletServer] INFO : files [hdfs://xxxx:54310/accumulo/tables/1xw/b-00jnmxe/I00jnmxq.rf]
not imported to 1xw;03;02: tablet 1xw;03;02 is closed
> these appear after messages about unloading tablets...it seems that tablets are being
redistributed at the same time as the bulk import is occuring.
> Steps to reproduce
> 1) I run a mapreduce job that produces random data in rfiles
> 2) copy the rfiles to an import directory
> 3) create table or deleterows -f
> 4) addsplits
> 5) importdirectory
> I have also performed the above completely within the mapreduce job, with similar results.
 The difference with the mapreduce job is that the time between adding splits and the import
directory is minutes rather than seconds.
> my current test creates 1000000 records, and after the importdirectory returns a count
of rows will be anywhere from ~800000 to 1000000.
> With my original workflow, I found that re-importing the same set of rfiles three times
would eventually get all rows loaded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message