accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (ACCUMULO-3967) bulk import loses records when loading pre-split table
Date Sat, 22 Aug 2015 23:34:46 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-3967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14708216#comment-14708216
] 

Josh Elser edited comment on ACCUMULO-3967 at 8/22/15 11:33 PM:
----------------------------------------------------------------

Hi again, [~etseidl]. I just ran a quick test on my laptop using your mapreduce job. After
a few iterations of trying 'T" and "H", I reproduced the issue.

* {{hdfs dfs -mkdir bulkload}}
* {{hdfs dfs -mkdir bulkload/failures}}
* {{tool.sh target/bulkload-loss-0.0.1-SNAPSHOT.jar TestBulkLoad accumulo17 localhost root
secret bulkload T 1000000}}
* {{accumulo shell -u root -p secret -e 'scan -np -t loadtest.T_Test' | fgrep -v WARN | fgrep
-v INFO  | wc -l}}

One time so far, it's given me a count of 875336 instead of 1M. There are no files in failures.
1M map output records in the MR job's counters. After compacting the table, the monitor also
agrees that there are only 875.34K entries.

I did see similar errors from the BulkImporter:

{noformat}
logs/tserver_hw10447.local.debug.log:2015-08-22 19:14:24,598 [client.BulkImporter] INFO :
Could not assign 1 map files to tablet 4;02;01 because : Not Serving Tablet .  Will retry
...
logs/tserver_hw10447.local.debug.log:2015-08-22 19:14:24,598 [client.BulkImporter] INFO :
Could not assign 1 map files to tablet 4;23;22 because : Not Serving Tablet .  Will retry
...
logs/tserver_hw10447.local.debug.log:2015-08-22 19:14:24,655 [client.BulkImporter] INFO :
Could not assign 1 map files to tablet 4;11;10 because : Not Serving Tablet .  Will retry
...
logs/tserver_hw10447.local.debug.log:2015-08-22 19:14:24,656 [client.BulkImporter] INFO :
Could not assign 1 map files to tablet 4;12;11 because : Not Serving Tablet .  Will retry
...
{noformat}

but I didn't get an errors about the tablet being closed (I'm assuming that one is unrelated
to the bug, just a side-effect from the master balancing things).

I'll poke around here some tonight to see if I can get some answers as to why.


was (Author: elserj):
Hi again, [~etseidl]. I just ran a quick test on my laptop using your mapreduce job. After
a few iterations of trying 'T" and "H", I reproduced the issue.

* {{hdfs dfs -mkdir bulkload}}
* {{hdfs dfs -mkdir bulkload/failures}}
* {{tool.sh target/bulkload-loss-0.0.1-SNAPSHOT.jar TestBulkLoad accumulo17 localhost root
secret bulkload T 1000000}}
* {{accumulo shell -u root -p secret -e 'scan -np -t loadtest.T_Test' | fgrep -v WARN | fgrep
-v INFO  | wc -l}}

One time so far, it's given me a count of 875336 instead of 1M. There are no files in failures.
1M map output records in the MR job's counters. After compacting the table, the monitor also
agrees that there are only 875.34K entries.

I did see similar errors from the BulkImporter:

{noformat}
logs/tserver_hw10447.local.debug.log:2015-08-22 19:14:24,598 [client.BulkImporter] INFO :
Could not assign 1 map files to tablet 4;02;01 because : Not Serving Tablet .  Will retry
...
logs/tserver_hw10447.local.debug.log:2015-08-22 19:14:24,598 [client.BulkImporter] INFO :
Could not assign 1 map files to tablet 4;23;22 because : Not Serving Tablet .  Will retry
...
logs/tserver_hw10447.local.debug.log:2015-08-22 19:14:24,655 [client.BulkImporter] INFO :
Could not assign 1 map files to tablet 4;11;10 because : Not Serving Tablet .  Will retry
...
logs/tserver_hw10447.local.debug.log:2015-08-22 19:14:24,656 [client.BulkImporter] INFO :
Could not assign 1 map files to tablet 4;12;11 because : Not Serving Tablet .  Will retry
...
logs/tserver_hw10447.local.log:2015-08-22 19:14:24,598 [client.BulkImporter] INFO : Could
not assign 1 map files to tablet 4;02;01 because : Not Serving Tablet .  Will retry ...
logs/tserver_hw10447.local.log:2015-08-22 19:14:24,598 [client.BulkImporter] INFO : Could
not assign 1 map files to tablet 4;23;22 because : Not Serving Tablet .  Will retry ...
logs/tserver_hw10447.local.log:2015-08-22 19:14:24,655 [client.BulkImporter] INFO : Could
not assign 1 map files to tablet 4;11;10 because : Not Serving Tablet .  Will retry ...
logs/tserver_hw10447.local.log:2015-08-22 19:14:24,656 [client.BulkImporter] INFO : Could
not assign 1 map files to tablet 4;12;11 because : Not Serving Tablet .  Will retry ...
{noformat}

but I didn't get an errors about the tablet being closed (I'm assuming that one is unrelated
to the bug, just a side-effect from the master balancing things).

I'll poke around here some tonight to see if I can get some answers as to why.

> bulk import loses records when loading pre-split table
> ------------------------------------------------------
>
>                 Key: ACCUMULO-3967
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3967
>             Project: Accumulo
>          Issue Type: Bug
>          Components: client, tserver
>    Affects Versions: 1.7.0
>         Environment: generic hadoop 2.6.0, zookeeper 3.4.6 on redhat 6.7
> 7 node cluster
>            Reporter: Edward Seidl
>            Priority: Blocker
>             Fix For: 1.7.1, 1.8.0
>
>
> I just noticed that some records I'm loading via importDirectory go missing.  After a
lot of digging around trying to reproduce the problem, I discovered that it occurs most frequently
when loading a table that I have just recently added splits to.  In the tserver logs I'll
see messages like 
> 20 16:25:36,805 [client.BulkImporter] INFO : Could not assign 1 map files to tablet 1xw;18;17
because : Not Serving Tablet .  Will retry ...
>  
> or
> 20 16:25:44,826 [tserver.TabletServer] INFO : files [hdfs://xxxx:54310/accumulo/tables/1xw/b-00jnmxe/I00jnmxq.rf]
not imported to 1xw;03;02: tablet 1xw;03;02 is closed
> these appear after messages about unloading tablets...it seems that tablets are being
redistributed at the same time as the bulk import is occuring.
> Steps to reproduce
> 1) I run a mapreduce job that produces random data in rfiles
> 2) copy the rfiles to an import directory
> 3) create table or deleterows -f
> 4) addsplits
> 5) importdirectory
> I have also performed the above completely within the mapreduce job, with similar results.
 The difference with the mapreduce job is that the time between adding splits and the import
directory is minutes rather than seconds.
> my current test creates 1000000 records, and after the importdirectory returns a count
of rows will be anywhere from ~800000 to 1000000.
> With my original workflow, I found that re-importing the same set of rfiles three times
would eventually get all rows loaded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message