hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Yu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-16398) optimize HRegion computeHDFSBlocksDistribution
Date Fri, 09 Dec 2016 09:58:58 GMT

    [ https://issues.apache.org/jira/browse/HBASE-16398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15734854#comment-15734854
] 

Ted Yu commented on HBASE-16398:
--------------------------------

Under https://builds.apache.org/job/PreCommit-HBASE-Build/4851/artifact/patchprocess/patch-unit-hbase-server.txt
, you can see:
{code}
testScanEmptyToEmpty(org.apache.hadoop.hbase.mapred.TestMultiTableSnapshotInputFormat)  Time
elapsed: 6.063 sec  <<< ERROR!
java.io.FileNotFoundException: File hdfs://localhost:46473/user/jenkins/target/test-data/935de6fd-1334-44ea-99d5-2d45d3860308/scantest1_snapshot__b08a9b38-8279-4d30-b57b-368c12e680c0/data/default/scantest1/8d956880c9776c65be72cb1bc549490b/contents
does not exist.
	at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:948)
	at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:927)
	at org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:872)
	at org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:868)
	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
	at org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:886)
	at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1694)
	at org.apache.hadoop.fs.FileSystem$6.<init>(FileSystem.java:1787)
	at org.apache.hadoop.fs.FileSystem.listFiles(FileSystem.java:1783)
	at org.apache.hadoop.hbase.util.FSUtils.listLocatedStatus(FSUtils.java:1889)
	at org.apache.hadoop.hbase.regionserver.HRegionFileSystem.getStoreFilesLocatedStatus(HRegionFileSystem.java:234)
	at org.apache.hadoop.hbase.regionserver.HRegion.computeHDFSBlocksDistribution(HRegion.java:1114)
	at org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormatImpl.getSplits(TableSnapshotInputFormatImpl.java:323)
	at org.apache.hadoop.hbase.mapreduce.MultiTableSnapshotInputFormatImpl.getSplits(MultiTableSnapshotInputFormatImpl.java:113)
	at org.apache.hadoop.hbase.mapred.MultiTableSnapshotInputFormat.getSplits(MultiTableSnapshotInputFormat.java:98)
	at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329)
	at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:320)
	at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
	at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
	at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575)
	at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
	at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570)
	at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561)
	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:862)
	at org.apache.hadoop.hbase.mapred.TestMultiTableSnapshotInputFormat.runJob(TestMultiTableSnapshotInputFormat.java:71)
{code}
If the above cannot be reproduced locally, please resubmit the patch.

> optimize HRegion computeHDFSBlocksDistribution
> ----------------------------------------------
>
>                 Key: HBASE-16398
>                 URL: https://issues.apache.org/jira/browse/HBASE-16398
>             Project: HBase
>          Issue Type: Sub-task
>          Components: regionserver
>    Affects Versions: 2.0.0
>            Reporter: binlijin
>            Assignee: binlijin
>             Fix For: 2.0.0
>
>         Attachments: HBASE-16398.patch, HBASE-16398_v2.patch, HBASE-16398_v3.patch, LocatedBlockStatusComparison.java
>
>
> First i assume there is no reference and link in a region family's directory. 
> Without the patch to computeHDFSBlocksDistribution for a region family, there is 1+2*N
rpc call, N is hfile numbers, The first rpc call is to DistributedFileSystem#listStatus to
get hfiles, for every hfile there is two rpc call DistributedFileSystem#getFileStatus(path)
and then DistributedFileSystem#getFileBlockLocations(status, start, length).
> With the patch to computeHDFSBlocksDistribution for a region family, there is 2 rpc call,
they are DistributedFileSystem#getFileStatus(path) and  DistributedFileSystem#listLocatedStatus(final
Path p, final PathFilter filter).
> So if there is at least one hfile, with the patch, the rpc call will less.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message