Return-Path: X-Original-To: apmail-htrace-dev-archive@minotaur.apache.org Delivered-To: apmail-htrace-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D656517CBB for ; Tue, 3 Mar 2015 03:23:37 +0000 (UTC) Received: (qmail 61207 invoked by uid 500); 3 Mar 2015 03:23:37 -0000 Delivered-To: apmail-htrace-dev-archive@htrace.apache.org Received: (qmail 61147 invoked by uid 500); 3 Mar 2015 03:23:37 -0000 Mailing-List: contact dev-help@htrace.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@htrace.incubator.apache.org Delivered-To: mailing list dev@htrace.incubator.apache.org Received: (qmail 61135 invoked by uid 99); 3 Mar 2015 03:23:37 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Mar 2015 03:23:37 +0000 X-ASF-Spam-Status: No, hits=-1997.8 required=5.0 tests=ALL_TRUSTED,HTML_MESSAGE,T_RP_MATCHES_RCVD,WEIRD_PORT X-Spam-Check-By: apache.org Received: from [140.211.11.3] (HELO mail.apache.org) (140.211.11.3) by apache.org (qpsmtpd/0.29) with SMTP; Tue, 03 Mar 2015 03:23:12 +0000 Received: (qmail 60830 invoked by uid 99); 3 Mar 2015 03:23:09 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Mar 2015 03:23:09 +0000 Received: from mail-we0-f170.google.com (mail-we0-f170.google.com [74.125.82.170]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id B6CAF1A02BD for ; Tue, 3 Mar 2015 03:23:08 +0000 (UTC) Received: by wesq59 with SMTP id q59so37256695wes.1 for ; Mon, 02 Mar 2015 19:23:06 -0800 (PST) MIME-Version: 1.0 X-Received: by 10.180.76.178 with SMTP id l18mr40617500wiw.36.1425352986260; Mon, 02 Mar 2015 19:23:06 -0800 (PST) Received: by 10.194.189.137 with HTTP; Mon, 2 Mar 2015 19:23:06 -0800 (PST) Date: Mon, 2 Mar 2015 19:23:06 -0800 Message-ID: Subject: Getting started with Apache HTrace development From: "Colin P. McCabe" To: dev@htrace.incubator.apache.org Content-Type: multipart/alternative; boundary=f46d043893876cfcf4051059da16 X-Virus-Checked: Checked by ClamAV on apache.org --f46d043893876cfcf4051059da16 Content-Type: text/plain; charset=UTF-8 A few people have asked how to get started with HTrace development. It's a good question and we don't have a great README up about it so I thought I would write something. HTrace is all about tracing distributed systems. So the best way to get started is to plug htrace into your favorite distributed system and see what cool things happen or what bugs pop up. Since I'm an HDFS developer, that's the distributed system that I'm most familiar with. So I will do a quick writeup about how to use HTrace + HDFS. (HBase + HTrace is another very important use-case that I would like to write about later, but one step at a time.) Just a quick note: a lot of this software is relatively new. So there may be bugs or integration pain points that you encounter. There has not yet been a stable release of Hadoop that contained Apache HTrace. There have been releases that contained the pre-Apache version of HTrace, but that's no fun. If we want to do development, we want to be able to run the latest version of the code. So we will have to build it ourselves. Building HTrace is not too bad. First we install the dependencies: cmccabe@keter:~/> apt-get install java javac google-go leveldb-devel If you have a different Linux distro this command will vary slightly, of course. On Macs, "brew" is a good option. Next we use Maven to build the source: > cmccabe@keter:~/> git clone https://git-wip-us.apache.org/repos/asf/incubator-htrace.git > cmccabe@keter:~/> cd incubator-htrace > cmccabe@keter:~/> git checkout master > cmccabe@keter:~/> mvn install -DskipTests -Dmaven.javadoc.skip=true -Drat.skip OK. So htrace is built and installed to the local ~/.m2 directory. We should see it under the .m2: cmccabe@keter:~/> find ~/.m2 | grep htrace-core ... > /home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-SNAPSHOT > /home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-SNAPSHOT/htrace-core-3.2.0-SNAPSHOT.jar.lastUpdated > /home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-SNAPSHOT/htrace-core-3.2.0-SNAPSHOT.pom.lastUpdated ... The version you built should be 3.2.0-SNAPSHOT. Next, we check out Hadoop: > cmccabe@keter:~/> git clone https://git-wip-us.apache.org/repos/asf/hadoop.git > cmccabe@keter:~/> cd hadoop > cmccabe@keter:~/> git checkout branch-2 So we are basically building a pre-release version of Hadoop 2.7, currently known as branch-2. We will need to modify Hadoop to use 3.2.0-SNAPSHOT rather than the stable 3.1.0 release which it would ordinarily use in branch-2. I applied this diff to hadoop-project/pom.xml > diff --git a/hadoop-project/pom.xml b/hadoop-project/pom.xml > index 569b292..5b7e466 100644 > --- a/hadoop-project/pom.xml > +++ b/hadoop-project/pom.xml > @@ -785,7 +785,7 @@ > > org.apache.htrace > htrace-core > - 3.1.0-incubating > + 3.2.0-incubating-SNAPSHOT > > > org.jdom Next, I built Hadoop: cmccabe@keter:~/> mvn package -Pdist -DskipTests -Dmaven.javadoc.skip=true You should get a package with Hadoop jars named like so: ... ./hadoop-hdfs-project/hadoop-hdfs/target/hadoop-hdfs-2.7.0-SNAPSHOT/share/hadoop/hdfs/lib/commons-codec-1.4.jar ./hadoop-hdfs-project/hadoop-hdfs/target/hadoop-hdfs-2.7.0-SNAPSHOT/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar ... This package should also contain an htrace-3.2.0-SNAPSHOT jar. OK, so how can we start seeing some trace spans? The easiest way is to configure LocalFileSpanReceiver. Add this to your hdfs-site.xml: > > hadoop.htrace.spanreceiver.classes > org.apache.htrace.impl.LocalFileSpanReceiver > > > hadoop.htrace.sampler > AlwaysSampler > When you run the Hadoop daemons, you should see them writing to files named /tmp/${PROCESS_ID} (for each different process). If this doesn't happen, try cranking up your log4j level to TRACE to see why the SpanReceiver could not be created. You should see something like this in the log4j logs: > 13:28:33,885 TRACE SpanReceiverBuilder:94 - Created new span receiver of type org.apache.htrace.impl.LocalFileSpanReceiver > at org.apache.htrace.SpanReceiverBuilder.build(SpanReceiverBuilder.java:92) > at org.apache.hadoop.tracing.SpanReceiverHost.loadInstance(SpanReceiverHost.java:161) > at org.apache.hadoop.tracing.SpanReceiverHost.loadSpanReceivers(SpanReceiverHost.java:147) > at org.apache.hadoop.tracing.SpanReceiverHost.getInstance(SpanReceiverHost.java:82) Running htraced is easy. You simply run the binary: > cmccabe@keter:~/src/htrace> ./htrace-core/src/go/build/htraced -Dlog.level=TRACE -Ddata.store.clear You should see messages like this: > cmccabe@keter:~/src/htrace> ./htrace-core/src/go/build/htraced -Dlog.level=TRACE -Ddata.store.clear > 2015-03-02T19:08:33-08:00 D: HTRACED_CONF_DIR=/home/cmccabe/conf > 2015-03-02T19:08:33-08:00 D: data.store.clear = true > 2015-03-02T19:08:33-08:00 D: log.level = TRACE > 2015-03-02T19:08:33-08:00 I: Cleared existing datastore directory /tmp/htrace1/db > 2015-03-02T19:08:33-08:00 D: LevelDB failed to open /tmp/htrace1/db: Invalid argument: /tmp/htrace1/db: does not exist (create_if_missing is false) > 2015-03-02T19:08:33-08:00 I: Created new LevelDB instance in /tmp/htrace1/db > 2015-03-02T19:08:33-08:00 T: Wrote layout version 2 to shard at /tmp/htrace1/db. > 2015-03-02T19:08:33-08:00 I: Cleared existing datastore directory /tmp/htrace2/db > 2015-03-02T19:08:33-08:00 D: LevelDB failed to open /tmp/htrace2/db: Invalid argument: /tmp/htrace2/db: does not exist (create_if_missing is false) > 2015-03-02T19:08:33-08:00 I: Created new LevelDB instance in /tmp/htrace2/db > 2015-03-02T19:08:33-08:00 T: Wrote layout version 2 to shard at /tmp/htrace2/db. ... Similar to Hadoop daemons, htraced can be configured either through an XML file named htraced-conf.xml (found in a location pointed to by HTRACED_CONF_DIR), or by passing -Dkey=value flags on the command line. Let's check out the htrace command. > cmccabe@keter:~/src/htrace> ./htrace-core/src/go/build/htrace serverInfo > HTraced server version 3.2.0-incubating-SNAPSHOT (5c0a712c7dd4263f5e2a88d4c61a0facab25953f) "serverInfo" queries the htraced server via REST and get back a response. For help using the htrace command, we can run: > cmccabe@keter:~/src/htrace> ./htrace-core/src/go/build/htrace --help > usage: ./htrace-core/src/go/build/htrace [] [] [ ...] > > The Apache HTrace command-line tool. This tool retrieves and modifies settings and other data on a running htraced daemon. > > If we find an htraced-conf.xml configuration file in the list of directories specified in HTRACED_CONF_DIR, we will use that configuration; otherwise, the defaults will be used. > > Flags: > --help Show help. > --Dmy.key="my.value" > Set configuration key 'my.key' to 'my.value'. Replace 'my.key' with any key you want to set. > --addr=ADDR Server address. > --verbose Verbose. > > Commands: > help [] > Show help for a command. > ... We can load spans into the htraced daemon from a text file using ./build/htraced loadSpans [file-path], and dump the span information using ./build/htraced dumpAll. Now, at this point, we would like our htraced client (Hadoop) to send spans directly to htraced, rather than dumping them to a local file. To make this work, we will need to put the htrace-htraced jar on the hadoop CLASSPATH. There is probably a better way to do it by setting HADOOP_CLASSPATH, but this simple script just puts the jar on every part of the Hadoop CLASSPATH I could think of where it might need to be: > #!/bin/bash > > # Copy the installed version of htrace-core to the correct hadoop jar locations > cat << EOF | xargs -n 1 cp /home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-incubating-SNAPSHOT/htrace-core-3.2.0-incubating-SNAPSHOT.jar > /home/cmccabe/hadoop-install/share/hadoop/httpfs/tomcat/webapps/webhdfs/WEB-INF/lib/htrace-core-3.2.0-incubating-SNAPSHOT.jar > /home/cmccabe/hadoop-install/share/hadoop/hdfs/lib/htrace-core-3.2.0-incubating-SNAPSHOT.jar > /home/cmccabe/hadoop-install/share/hadoop/tools/lib/htrace-core-3.2.0-incubating-SNAPSHOT.jar > /home/cmccabe/hadoop-install/share/hadoop/common/lib/htrace-core-3.2.0-incubating-SNAPSHOT.jar > /home/cmccabe/hadoop-install/share/hadoop/kms/tomcat/webapps/kms/WEB-INF/lib/htrace-core-3.2.0-incubating-SNAPSHOT.jar > EOF > > # Copy the installed version of htrace-htraced to the correct hadoop jar locations > cat << EOF | xargs -n 1 cp /home/cmccabe/.m2/repository/org/apache/htrace/htrace-htraced/3.2.0-incubating-SNAPSHOT/htrace-htraced-3.2.0-incubating-SNAPSHOT.jar > /home/cmccabe/hadoop-install/share/hadoop/hdfs/lib/htrace-htraced-3.2.0-incubating-SNAPSHOT.jar > /home/cmccabe/hadoop-install/share/hadoop/tools/lib/htrace-htraced-3.2.0-incubating-SNAPSHOT.jar > /home/cmccabe/hadoop-install/share/hadoop/common/lib/htrace-htraced-3.2.0-incubating-SNAPSHOT.jar > EOF At this point, I changed hdfs-site.xml so that hadoop.htrace.spanreceiver.classes was set to the htraced span receiver: > > hadoop.htrace.spanreceiver.classes > org.apache.htrace.impl.HTracedRESTReceiver > > > htraced.rest.url > http://lumbergh.initech.com:9095/ > Obviously set the htraced.rest.url to the host on which you are running htraced. This setup should work for sending spans to htraced. To see the web UI, point your web browser at http://lumbergh.initech.com:9095/ (or whatever the host name is for you where htraced is running). I hope this helps some folks out. Hopefully building Hadoop and massaging the classpath is not too bad. This install process will improve in the future, as more projects get stable releases with HTrace. There has also been some discussion of making docker images, which might help new developers get started. best, Colin --f46d043893876cfcf4051059da16--