accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Russ Weeks <rwe...@newbrightidea.com>
Subject DistCp hangs when copying large table export
Date Fri, 19 Aug 2016 17:46:24 GMT
This isn't really an Accumulo problem but I'd be grateful to know if
anybody else has hit and/or solved it. I'm trying to export table of ~160B
key/value pairs using exporttable and distcp as shown here:
https://accumulo.apache.org/1.7/examples/export

The command I'm using is "hadoop distcp -m 50 -update -skipcrccheck -f
/export/mytable/distcp.txt file:///mnt/backup"

distcp.txt contains 718 files.

The distcp job never even makes it into YARN, it looks like the driver is
stuck sorting the file listing for some reason. An example stack trace is:

"main" #1 prio=5 os_prio=0 tid=0x00007f1994015000 nid=0x7dc2 runnable
[0x00007f199c735000]
   java.lang.Thread.State: RUNNABLE
at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3797)
at java.util.regex.Pattern$Curly.match0(Pattern.java:4250)
at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
at java.util.regex.Pattern$Start.match(Pattern.java:3461)
at java.util.regex.Matcher.search(Matcher.java:1248)
at java.util.regex.Matcher.find(Matcher.java:637)
at java.util.regex.Pattern.split(Pattern.java:1209)
at java.lang.String.split(String.java:2380)
at java.lang.String.split(String.java:2422)
at
org.apache.hadoop.util.StringUtils.getTrimmedStrings(StringUtils.java:378)
at
org.apache.hadoop.conf.Configuration.getTrimmedStrings(Configuration.java:1900)
at
org.apache.hadoop.io.serializer.SerializationFactory.<init>(SerializationFactory.java:58)
at org.apache.hadoop.io.SequenceFile$Writer.init(SequenceFile.java:1176)
at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:1094)
at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:273)
at
org.apache.hadoop.io.SequenceFile$Sorter$SortPass.flush(SequenceFile.java:2946)
at
org.apache.hadoop.io.SequenceFile$Sorter$SortPass.run(SequenceFile.java:2890)
at org.apache.hadoop.io.SequenceFile$Sorter.sortPass(SequenceFile.java:2788)
at org.apache.hadoop.io.SequenceFile$Sorter.sort(SequenceFile.java:2736)
at org.apache.hadoop.io.SequenceFile$Sorter.sort(SequenceFile.java:2777)
at
org.apache.hadoop.tools.util.DistCpUtils.sortListing(DistCpUtils.java:364)
at
org.apache.hadoop.tools.CopyListing.validateFinalListing(CopyListing.java:145)
at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:91)
at
org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:90)
at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
at
org.apache.hadoop.tools.FileBasedCopyListing.doBuildListing(FileBasedCopyListing.java:70)
at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
at org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:382)
at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:181)
at org.apache.hadoop.tools.DistCp.execute(DistCp.java:153)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:126)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:430)

It's been at it for 16 hours. The exact stack trace varies but it's always
within DistCpUtils.sortListing.

The Hadoop distro is HDP 2.3.4, Hadoop 2.7.1. HDFS is running with Kerberos
and encryption.

Any advice is very welcome!

Thanks,
-Russ

Mime
View raw message