hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hive/DeveloperGuide" by SchubertZhang
Date Fri, 19 Mar 2010 07:54:00 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Hive/DeveloperGuide" page has been changed by SchubertZhang.
http://wiki.apache.org/hadoop/Hive/DeveloperGuide?action=diff&rev1=24&rev2=25

--------------------------------------------------

  
  Hive currently use these FileFormat classes to read and write HDFS files:
  
-  * !TextInputFormat/NoKeyTextOutputFormat: These 2 classes read/write data in plain text
file format.
+  * !TextInputFormat/HiveIgnoreKeyTextOutputFormat: These 2 classes read/write data in plain
text file format.
   * !SequenceFileInputFormat/SequenceFileOutputFormat: These 2 classes read/write data in
hadoop !SequenceFile format.
  
  Hive currently use these !SerDe classes to serialize and deserialize data:
@@ -202, +202 @@

  svn add ql/src/test/queries/clientpositive/myname.q ql/src/test/results/clientpositive/myname.q.out
  svn diff > patch.txt
  }}}
- 
  Similarly, to add negative client tests, write a new query input file in ql/src/test/queries/clientnegative
and run the same command, this time specifying the testcase name as !TestNegativeCliDriver
instead of !TestCliDriver. Note that for negative client tests, the output file if created
using the overwrite flag can be be found in the directory ql/src/test/results/clientnegative.
  
  Debugging Hive
@@ -210, +209 @@

  === Debugging Hive code ===
  Hive code includes both client-side code (e.g., compiler, semantic analyzer, and optimizer
of HiveQL) and server-side code (e.g., operator/task/SerDe implementations). The client-side
code are running on your local machine so you can easily debug it using Eclipse the same way
as you debug a regular local Java code.  The server-side code is distributed and running on
the Hadoop cluster, so debugging server-side Hive code is a little bit complicated. In addition
to printing to log files using log4j, you can also attach the debugger to a different JVM
under unit test (single machine mode). Below are the steps on how to debug on server-side
code.
  
-  * Compile Hive code with javac.debug=on. Under Hive checkout directory. {{{
+  * Compile Hive code with javac.debug=on. Under Hive checkout directory.
+  {{{
      > ant -Djavac.debug=on package
+ }}}
-   }}} If you have already built Hive without javac.debug=on, you can clean the build and
then run the above command. {{{
+  If you have already built Hive without javac.debug=on, you can clean the build and then
run the above command.
+  {{{
      > ant clean  # not necessary if the first time to compile
-     > ant -Djavac.debug=on package 
+     > ant -Djavac.debug=on package
-   }}}
+ }}}
-  * Run ant test with additional options to tell the Java VM that is running Hive server-side
code to wait for the debugger to attach. First define some convenient macros for debugging.
You can put it in your .bashrc or .cshrc. {{{
+  * Run ant test with additional options to tell the Java VM that is running Hive server-side
code to wait for the debugger to attach. First define some convenient macros for debugging.
You can put it in your .bashrc or .cshrc.
+  {{{
      > export HIVE_DEBUG_PORT=8000
      > export $HIVE_DEBUG="-Xdebug -Xrunjdwp:transport=dt_socket,address=${HIVE_DEBUG_PORT},server=y,suspend=y"
+ }}}
-   }}} In particular HIVE_DEBUG_PORT is the port number that the JVM is listening on and
the debugger will attach to. Then run the unit test as follows: {{{
+  In particular HIVE_DEBUG_PORT is the port number that the JVM is listening on and the debugger
will attach to. Then run the unit test as follows:
+  {{{
      > HADOOP_OPTS=$HIVE_DEBUG ant test -Dtestcase=TestCliDriver -Dqfile=<mytest>.q
+ }}}
-   }}} The unit test will run until it shows: {{{
+  The unit test will run until it shows:
+  {{{
       [junit] Listening for transport dt_socket at address: 8000
-   }}}
+ }}}
-  * Now, you can use jdb to attach to port 8000 to debug{{{
+  * Now, you can use jdb to attach to port 8000 to debug
+  {{{
      > jdb -attach 8000
+ }}}
- }}}  or if you are running Eclipse and the Hive projects are already imported, you can debug
with Eclipse. Under Eclipse Run -> Debug Configurations, find "Remote Java Application"
at the bottom of the left panel. There should be a MapRedTask configuration already. If there
is no such configuration, you can create one with the following property:
+  or if you are running Eclipse and the Hive projects are already imported, you can debug
with Eclipse. Under Eclipse Run -> Debug Configurations, find "Remote Java Application"
at the bottom of the left panel. There should be a MapRedTask configuration already. If there
is no such configuration, you can create one with the following property:
-      * Name: any time such as MapRedTask
+   * Name: any time such as MapRedTask
-      * Project:  the Hive project that you imported.
+   * Project:  the Hive project that you imported.
-      * Connection Type: Standard (Socket Attach)
+   * Connection Type: Standard (Socket Attach)
-      * Connection Properties:
+   * Connection Properties:
-        * Host: localhost
+    * Host: localhost
-        * Port: 8000
+    * Port: 8000
-      Then hit the "Debug" button and Eclipse will attach to the JVM listening on port 8000
and continue running till the end. If you define breakpoints in the source code before hitting
the "Debug" button, it will stop there. The rest is the same as debugging client-side Hive.
+   Then hit the "Debug" button and Eclipse will attach to the JVM listening on port 8000
and continue running till the end. If you define breakpoints in the source code before hitting
the "Debug" button, it will stop there. The rest is the same as debugging client-side Hive.
- 
  
  == Pluggable interfaces ==
  === File Formats ===

Mime
View raw message