htrace-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin P. McCabe" <>
Subject Re: Trace HBase/HDFS with HTrace
Date Wed, 11 Feb 2015 19:13:03 GMT
Thanks for trying stuff out!  Sorry that this is a little difficult at
the moment.

To really do this right, you would want to be using Hadoop with HTrace
3.1.0, and HBase with HTrace 3.1.0.  Unfortunately, there hasn't been
a new release of Hadoop with HTrace 3.1.0.  The only existing releases
of Hadoop use an older version of the HTrace library.  So you will
have to build from source.

If you check out Hadoop's "branch-2" branch (currently, this branch
represents what will be in the 2.7 release, when it is cut), and build
that, you will get the latest.  Then you have to build a version of
HBase against the version of Hadoop you have built.

By default, HBase's Maven build will build against upstream release
versions of Hadoop only. So just setting
-Dhadoop.version=2.7.0-SNAPSHOT is not enough, since it won't know
where to find the jars.  To get around this problem, you can create
your own local maven repo. Here's how.

In hadoop/pom.xml, add these lines to the distributionManagement stanza:

+    <repository>
+      <id>localdump</id>
+      <url>file:///home/cmccabe/localdump/releases</url>
+    </repository>
+    <snapshotRepository>
+      <id>localdump</id>
+      <url>file:///home/cmccabe/localdump/snapshots</url>
+    </snapshotRepository>

Comment out the repositories that are already there.

Now run mkdir /home/cmccabe/localdump.

Then, in your hadoop tree, run mvn deploy -DskipTests.

You should get a localdump directory that has files kind of like this:


Now, add the following lines to your HBase pom.xml:

+      <id>localdump</id>
+      <url>file:///home/cmccabe/localdump</url>
+      <name>Local Dump</name>
+      <snapshots>
+        <enabled>true</enabled>
+      </snapshots>
+      <releases>
+        <enabled>true</enabled>
+      </releases>
+    </repository>
+    <repository>

This will allow you to run something like:
mvn test -Dtest=TestMiniClusterLoadSequential -PlocalTests
-DredirectTestOutputToFile=true -Dhadoop.profile=2.0
-Dhadoop.version=2.7.0-SNAPSHOT -Dcdh.hadoop.version=2.7.0-SNAPSHOT

Once we do a new release of Hadoop with HTrace 3.1.0 this will get a lot easier.

Related: Does anyone know what the best git branch to build from for
HBase would be for this kind of testing?  I've been meaning to do some
end to end testing (it's been on my TODO for a while)


On Wed, Feb 11, 2015 at 7:55 AM, Chunxu Tang <> wrote:
> Hi all,
> Now I’m exploiting HTrace to trace request level data flows in HBase and
> HDFS. I have successfully traced HBase and HDFS by using HTrace,
> respectively.
> After that, I combine HBase and HDFS together and I want to just send a
> PUT/GET request to HBase, but to trace the whole data flow in both HBase
> and HDFS. In my opinion, when I send a request such as Get to HBase, it
> will at last try to read the blocks on HDFS, so I can construct a whole
> data flow tracing through HBase and HDFS. While, the fact is that I can
> only get tracing data of HBase, with no data of HDFS.
> Could you give me any suggestions on how to trace the data flow in both
> HBase and HDFS? Does anyone have similar experience? Do I need to modify
> the source code? And maybe which part(s) should I touch? If I need to
> modify the code, I will try to create a patch for that.
> Thank you.
> My Configurations:
> Hadoop version: 2.6.0
> HBase version: 0.99.2
> HTrace version: htrace-master
> OS: Ubuntu 12.04
> Joshua

View raw message