hadoop-mapreduce-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From cdoug...@apache.org
Subject svn commit: r826384 - in /hadoop/mapreduce/trunk: ./ src/contrib/sqoop/ src/contrib/sqoop/doc/
Date Sun, 18 Oct 2009 09:28:04 GMT
Author: cdouglas
Date: Sun Oct 18 09:28:02 2009
New Revision: 826384

URL: http://svn.apache.org/viewvc?rev=826384&view=rev
Log:
MAPREDUCE-906. Update Sqoop documentation. Contributed by Aaron Kimball

Added:
    hadoop/mapreduce/trunk/src/contrib/sqoop/doc/   (with props)
    hadoop/mapreduce/trunk/src/contrib/sqoop/doc/.gitignore
    hadoop/mapreduce/trunk/src/contrib/sqoop/doc/Makefile
    hadoop/mapreduce/trunk/src/contrib/sqoop/doc/Sqoop-manpage.txt
    hadoop/mapreduce/trunk/src/contrib/sqoop/doc/SqoopUserGuide.txt
    hadoop/mapreduce/trunk/src/contrib/sqoop/doc/classnames.txt
    hadoop/mapreduce/trunk/src/contrib/sqoop/doc/connecting.txt
    hadoop/mapreduce/trunk/src/contrib/sqoop/doc/controlling-input-format.txt
    hadoop/mapreduce/trunk/src/contrib/sqoop/doc/controlling-output-format.txt
    hadoop/mapreduce/trunk/src/contrib/sqoop/doc/direct.txt
    hadoop/mapreduce/trunk/src/contrib/sqoop/doc/full-db-import.txt
    hadoop/mapreduce/trunk/src/contrib/sqoop/doc/hive.txt
    hadoop/mapreduce/trunk/src/contrib/sqoop/doc/input-formatting-args.txt
    hadoop/mapreduce/trunk/src/contrib/sqoop/doc/input-formatting.txt
    hadoop/mapreduce/trunk/src/contrib/sqoop/doc/intro.txt
    hadoop/mapreduce/trunk/src/contrib/sqoop/doc/listing-dbs.txt
    hadoop/mapreduce/trunk/src/contrib/sqoop/doc/listing-tables.txt
    hadoop/mapreduce/trunk/src/contrib/sqoop/doc/misc-args.txt
    hadoop/mapreduce/trunk/src/contrib/sqoop/doc/output-formatting-args.txt
    hadoop/mapreduce/trunk/src/contrib/sqoop/doc/output-formatting.txt
    hadoop/mapreduce/trunk/src/contrib/sqoop/doc/supported-dbs.txt
    hadoop/mapreduce/trunk/src/contrib/sqoop/doc/table-import.txt
    hadoop/mapreduce/trunk/src/contrib/sqoop/readme.txt
Removed:
    hadoop/mapreduce/trunk/src/contrib/sqoop/readme.html
Modified:
    hadoop/mapreduce/trunk/CHANGES.txt
    hadoop/mapreduce/trunk/src/contrib/sqoop/build.xml

Modified: hadoop/mapreduce/trunk/CHANGES.txt
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/CHANGES.txt?rev=826384&r1=826383&r2=826384&view=diff
==============================================================================
--- hadoop/mapreduce/trunk/CHANGES.txt (original)
+++ hadoop/mapreduce/trunk/CHANGES.txt Sun Oct 18 09:28:02 2009
@@ -14,6 +14,8 @@
     MAPREDUCE-1048. Add occupied/reserved slot usage summary on jobtracker UI.
     (Amareshwari Sriramadasu via sharad)
 
+    MAPREDUCE-906. Update Sqoop documentation. (Aaron Kimball via cdouglas)
+
   OPTIMIZATIONS
 
     MAPREDUCE-270. Fix the tasktracker to optionally send an out-of-band

Modified: hadoop/mapreduce/trunk/src/contrib/sqoop/build.xml
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/contrib/sqoop/build.xml?rev=826384&r1=826383&r2=826384&view=diff
==============================================================================
--- hadoop/mapreduce/trunk/src/contrib/sqoop/build.xml (original)
+++ hadoop/mapreduce/trunk/src/contrib/sqoop/build.xml Sun Oct 18 09:28:02 2009
@@ -152,4 +152,12 @@
     <fail if="tests.failed">Tests failed!</fail>
   </target>
 
+  <target name="doc">
+    <exec executable="make" failonerror="true">
+      <arg value="-C" />
+      <arg value="${basedir}/doc" />
+      <arg value="BUILDROOT=${build.dir}" />
+    </exec>
+  </target>
+
 </project>

Propchange: hadoop/mapreduce/trunk/src/contrib/sqoop/doc/
------------------------------------------------------------------------------
--- svn:ignore (added)
+++ svn:ignore Sun Oct 18 09:28:02 2009
@@ -0,0 +1,3 @@
+Sqoop-manpage.xml
+sqoop.1
+Sqoop-web.html

Added: hadoop/mapreduce/trunk/src/contrib/sqoop/doc/.gitignore
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/contrib/sqoop/doc/.gitignore?rev=826384&view=auto
==============================================================================
--- hadoop/mapreduce/trunk/src/contrib/sqoop/doc/.gitignore (added)
+++ hadoop/mapreduce/trunk/src/contrib/sqoop/doc/.gitignore Sun Oct 18 09:28:02 2009
@@ -0,0 +1,17 @@
+#  Licensed to the Apache Software Foundation (ASF) under one or more
+#  contributor license agreements.  See the NOTICE file distributed with
+#  this work for additional information regarding copyright ownership.
+#  The ASF licenses this file to You under the Apache License, Version 2.0
+#  (the "License"); you may not use this file except in compliance with
+#  the License.  You may obtain a copy of the License at#
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+#  limitations under the License.
+/Sqoop-manpage.xml
+/sqoop.1
+/Sqoop-web.html

Added: hadoop/mapreduce/trunk/src/contrib/sqoop/doc/Makefile
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/contrib/sqoop/doc/Makefile?rev=826384&view=auto
==============================================================================
--- hadoop/mapreduce/trunk/src/contrib/sqoop/doc/Makefile (added)
+++ hadoop/mapreduce/trunk/src/contrib/sqoop/doc/Makefile Sun Oct 18 09:28:02 2009
@@ -0,0 +1,43 @@
+#  Licensed to the Apache Software Foundation (ASF) under one or more
+#  contributor license agreements.  See the NOTICE file distributed with
+#  this work for additional information regarding copyright ownership.
+#  The ASF licenses this file to You under the Apache License, Version 2.0
+#  (the "License"); you may not use this file except in compliance with
+#  the License.  You may obtain a copy of the License at#
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+#  limitations under the License.
+
+BUILDROOT=../../../../build/contrib/sqoop
+BUILD_DIR=$(BUILDROOT)/doc
+
+all: man userguide
+
+man: $(BUILD_DIR)/sqoop.1.gz
+
+userguide: $(BUILD_DIR)/SqoopUserGuide.html
+
+$(BUILD_DIR)/sqoop.1.gz: Sqoop-manpage.txt *formatting*.txt
+	asciidoc -b docbook -d manpage Sqoop-manpage.txt
+	xmlto man Sqoop-manpage.xml
+	gzip sqoop.1
+	rm Sqoop-manpage.xml
+	mkdir -p $(BUILD_DIR)
+	mv sqoop.1.gz $(BUILD_DIR)
+
+$(BUILD_DIR)/SqoopUserGuide.html: SqoopUserGuide.txt *.txt
+	asciidoc SqoopUserGuide.txt
+	mkdir -p $(BUILD_DIR)
+	mv SqoopUserGuide.html $(BUILD_DIR)
+
+clean:
+	-rm $(BUILD_DIR)/sqoop.1.gz
+	-rm $(BUILD_DIR)/SqoopUserGuide.html
+
+.PHONY: all man userguide clean
+

Added: hadoop/mapreduce/trunk/src/contrib/sqoop/doc/Sqoop-manpage.txt
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/contrib/sqoop/doc/Sqoop-manpage.txt?rev=826384&view=auto
==============================================================================
--- hadoop/mapreduce/trunk/src/contrib/sqoop/doc/Sqoop-manpage.txt (added)
+++ hadoop/mapreduce/trunk/src/contrib/sqoop/doc/Sqoop-manpage.txt Sun Oct 18 09:28:02 2009
@@ -0,0 +1,177 @@
+sqoop(1)
+========
+
+////
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+////
+
+NAME
+----
+sqoop - SQL-to-Hadoop import tool
+
+SYNOPSIS
+--------
+'sqoop' <options>
+
+DESCRIPTION
+-----------
+Sqoop is a tool designed to help users of large data import existing
+relational databases into their Hadoop clusters. Sqoop uses JDBC to
+connect to a database, examine each table's schema, and auto-generate
+the necessary classes to import data into HDFS. It then instantiates
+a MapReduce job to read tables from the database via the DBInputFormat
+(JDBC-based InputFormat). Tables are read into a set of files loaded
+into HDFS. Both SequenceFile and text-based targets are supported. Sqoop
+also supports high-performance imports from select databases including MySQL.
+
+OPTIONS
+-------
+
+The +--connect+ option is always required. To perform an import, one of
++--table+ or +--all-tables+ is required as well. Alternatively, you can
+specify +--generate-only+ or one of the arguments in "Additional commands."
+
+
+Database connection options
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+--connect (jdbc-uri)::
+  Specify JDBC connect string (required)
+
+--driver (class-name)::
+  Manually specify JDBC driver class to use
+
+--username (username)::
+  Set authentication username
+
+--password (password)::
+  Set authentication password
+  (Note: This is very insecure. You should use -P instead.)
+
+-P::
+  Prompt for user password
+
+--direct::
+  Use direct import fast path (mysql only)
+
+Import control options
+~~~~~~~~~~~~~~~~~~~~~~
+
+--all-tables::
+  Import all tables in database
+  (Ignores +--table+, +--columns+, +--order-by+, and +--where+)
+
+--columns (col,col,col...)::
+  Columns to export from table
+
+--split-by (column-name)
+  Column of the table used to split the table for parallel import
+
+--hadoop-home (dir)::
+  Override $HADOOP_HOME
+
+--hive-home (dir)::
+  Override $HIVE_HOME
+
+--warehouse-dir (dir)::
+  Tables are uploaded to the HDFS path +/warehouse/dir/(tablename)/+
+
+--as-sequencefile::
+  Imports data to SequenceFiles
+
+--as-textfile::
+  Imports data as plain text (default)
+
+--hive-import::
+  If set, then import the table into Hive
+
+--table (table-name)::
+  The table to import
+
+--where (clause)
+Import only the rows for which _clause_ is true.
+e.g.: `--where "user_id > 400 AND hidden == 0"`
+
+
+Output line formatting options
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+include::output-formatting.txt[]
+include::output-formatting-args.txt[]
+
+Input line parsing options
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+include::input-formatting.txt[]
+include::input-formatting-args.txt[]
+
+Code generation options
+~~~~~~~~~~~~~~~~~~~~~~~
+
+--bindir (dir)::
+  Output directory for compiled objects
+
+--class-name (name)::
+  Sets the name of the class to generate. By default, classes are
+  named after the table they represent. Using this parameters
+  ignores +--package-name+.
+
+--generate-only::
+  Stop after code generation; do not import
+
+--outdir (dir)::
+  Output directory for generated code
+
+--package-name (package)::
+  Puts auto-generated classes in the named Java package
+
+Additional commands
+~~~~~~~~~~~~~~~~~~~
+
+These commands cause Sqoop to report information and exit;
+no import or code generation is performed.
+
+--debug-sql (statement)::
+  Execute 'statement' in SQL and display the results
+
+--help::
+  Display usage information and exit
+
+--list-databases::
+  List all databases available and exit
+
+--list-tables::
+  List tables in database and exit
+
+
+ENVIRONMENT
+-----------
+
+JAVA_HOME::
+  As part of its import process, Sqoop generates and compiles Java code
+  by invoking the Java compiler *javac*(1). As a result, JAVA_HOME must
+  be set to the location of your JDK (note: This cannot just be a JRE).
+  e.g., +/usr/java/default+. Hadoop (and Sqoop) requires Sun Java 1.6 which
+  can be downloaded from http://java.sun.com.
+
+HADOOP_HOME::
+  The location of the Hadoop jar files. If you installed Hadoop via RPM
+  or DEB, these are in +/usr/lib/hadoop-20+.
+
+HIVE_HOME::
+  If you are performing a Hive import, you must identify the location of
+  Hive's jars and configuration. If you installed Hive via RPM or DEB,
+  these are in +/usr/lib/hive+.
+

Added: hadoop/mapreduce/trunk/src/contrib/sqoop/doc/SqoopUserGuide.txt
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/contrib/sqoop/doc/SqoopUserGuide.txt?rev=826384&view=auto
==============================================================================
--- hadoop/mapreduce/trunk/src/contrib/sqoop/doc/SqoopUserGuide.txt (added)
+++ hadoop/mapreduce/trunk/src/contrib/sqoop/doc/SqoopUserGuide.txt Sun Oct 18 09:28:02 2009
@@ -0,0 +1,63 @@
+
+////
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+////
+
+
+include::intro.txt[]
+
+
+The Sqoop Command Line
+----------------------
+
+To execute Sqoop, run with Hadoop:
+----
+$ bin/hadoop jar contrib/sqoop/hadoop-$(version)-sqoop.jar (arguments)
+----
+
+NOTE:Throughput this document, we will use `sqoop` as shorthand for the
+above. i.e., `$ sqoop (arguments)`
+
+You pass this program options describing the
+import job you want to perform. If you need a hint, running Sqoop with
+`--help` will print out a list of all the command line
+options available. The +sqoop(1)+ manual page will also describe
+Sqoop's available arguments in greater detail. The manual page is built
+in `$HADOOP_HOME/build/contrib/sqoop/doc/sqoop.1.gz`.
+The following subsections will describe the most common modes of operation.
+
+include::connecting.txt[]
+
+include::listing-dbs.txt[]
+
+include::listing-tables.txt[]
+
+include::full-db-import.txt[]
+
+include::table-import.txt[]
+
+include::controlling-output-format.txt[]
+
+include::classnames.txt[]
+
+include::misc-args.txt[]
+
+include::direct.txt[]
+
+include::hive.txt[]
+
+include::supported-dbs.txt[]
+

Added: hadoop/mapreduce/trunk/src/contrib/sqoop/doc/classnames.txt
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/contrib/sqoop/doc/classnames.txt?rev=826384&view=auto
==============================================================================
--- hadoop/mapreduce/trunk/src/contrib/sqoop/doc/classnames.txt (added)
+++ hadoop/mapreduce/trunk/src/contrib/sqoop/doc/classnames.txt Sun Oct 18 09:28:02 2009
@@ -0,0 +1,43 @@
+
+////
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+////
+
+
+Generated Class Names
+~~~~~~~~~~~~~~~~~~~~~
+
+By default, classes are named after the table they represent. e.g.,
++sqoop --table foo+ will generate a file named +foo.java+. You can
+override the generated class name with the +--class-name+ argument.
+
+----
+$ sqoop --connect jdbc:mysql://database.example.com/employees \
+  --table employee_names --class-name com.example.EmployeeNames
+----
+_This generates a file named +com/example/EmployeeNames.java+_
+
+If you want to specify a package name for generated classes, but
+still want them to be named after the table they represent, you
+can instead use the argument +--package-name+:
+
+----
+$ sqoop --connect jdbc:mysql://database.example.com/employees \
+  --table employee_names --package-name com.example
+----
+_This generates a file named +com/example/employee_names.java+_
+
+

Added: hadoop/mapreduce/trunk/src/contrib/sqoop/doc/connecting.txt
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/contrib/sqoop/doc/connecting.txt?rev=826384&view=auto
==============================================================================
--- hadoop/mapreduce/trunk/src/contrib/sqoop/doc/connecting.txt (added)
+++ hadoop/mapreduce/trunk/src/contrib/sqoop/doc/connecting.txt Sun Oct 18 09:28:02 2009
@@ -0,0 +1,85 @@
+
+////
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+////
+
+
+Connecting to a Database Server
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Sqoop is designed to import tables from a database into HDFS. As such,
+it requires a _connect string_ that describes how to connect to the
+database. The _connect string_ looks like a URL, and is communicated to
+Sqoop with the +--connect+ argument. This describes the server and
+database to connect to; it may also specify the port. e.g.:
+
+----
+$ sqoop --connect jdbc:mysql://database.example.com/employees
+----
+
+This string will connect to a MySQL database named +employees+ on the
+host +database.example.com+. It's important that you *do not* use the URL
++localhost+ if you intend to use Sqoop with a distributed Hadoop
+cluster. The connect string you supply will be used on TaskTracker nodes
+throughout your MapReduce cluster; if they're told to connect to the
+literal name +localhost+, they'll each reach a different
+database (or more likely, no database at all)! Instead, you should use
+the full hostname or IP address of the database host that can be seen
+by all your remote nodes.
+
+You may need to authenticate against the database before you can
+access it. The +--username+ and +--password+ or +-P+ parameters can
+be used to supply a username and a password to the database. e.g.:
+
+----
+$ sqoop --connect jdbc:mysql://database.example.com/employees \
+    --username aaron --password 12345
+----
+
+.Password security
+WARNING: The +--password+ parameter is insecure, as other users may
+be able to read your password from the command-line arguments via
+the output of programs such as `ps`. The *+-P+* argument will read
+a password from a console prompt, and is the preferred method of
+entering credentials. Credentials may still be transferred between
+nodes of the MapReduce cluster using insecure means.
+
+Sqoop automatically supports several databases, including MySQL. Connect strings beginning
+with +jdbc:mysql://+ are handled automatically Sqoop, though you may need
+to install the driver yourself. (A full list of databases with
+built-in support is provided in the "Supported Databases" section, below.)
+
+You can use Sqoop with any other
+JDBC-compliant database as well. First, download the appropriate JDBC
+driver for the database you want to import from, and install the .jar
+file in the +/usr/hadoop/lib+ directory on all machines in your Hadoop
+cluster, or some other directory which is in the classpath
+on all nodes. Each driver jar also has a specific driver class which defines
+the entry-point to the driver. For example, MySQL's Connector/J library has
+a driver class of +com.mysql.jdbc.Driver+. Refer to your database
+vendor-specific documentation to determine the main driver class.
+This class must be provided as an argument to Sqoop with +--driver+.
+
+For example, to connect to a postgres database, first download the driver from
+link:http://jdbc.postgresql.org[http://jdbc.postgresql.org] and
+install it in your Hadoop lib path.
+Then run Sqoop with something like:
+
+----
+$ sqoop --connect jdbc:postgresql://postgres-server.example.com/employees \
+    --driver org.postgresql.Driver
+----
+

Added: hadoop/mapreduce/trunk/src/contrib/sqoop/doc/controlling-input-format.txt
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/contrib/sqoop/doc/controlling-input-format.txt?rev=826384&view=auto
==============================================================================
--- hadoop/mapreduce/trunk/src/contrib/sqoop/doc/controlling-input-format.txt (added)
+++ hadoop/mapreduce/trunk/src/contrib/sqoop/doc/controlling-input-format.txt Sun Oct 18 09:28:02 2009
@@ -0,0 +1,42 @@
+
+////
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+////
+
+
+Controlling the Input Format
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+include::input-formatting.txt[]
+
+The following arguments allow you to control the input format of
+records:
+
+include::input-formatting-args.txt[]
+
+If you have already imported data into HDFS in a text-based
+representation and want to change the delimiters being used, you
+should regenerate the class via `sqoop --generate-only`, specifying
+the new delimiters with +--fields-terminated-by+, etc., and the old
+delimiters with +--input-fields-terminated-by+, etc. Then run a
+MapReduce job where your mapper creates an instance of your record
+class, uses its +parse()+ method to read the fields using the old
+delimiters, and emits a new +Text+ output value via the record's
++toString()+ method, which will use the new delimiters. You'll then
+want to regenerate the class another time without the
++--input-fields-terminated-by+ specified so that the new delimiters
+are used for both input and output.
+

Added: hadoop/mapreduce/trunk/src/contrib/sqoop/doc/controlling-output-format.txt
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/contrib/sqoop/doc/controlling-output-format.txt?rev=826384&view=auto
==============================================================================
--- hadoop/mapreduce/trunk/src/contrib/sqoop/doc/controlling-output-format.txt (added)
+++ hadoop/mapreduce/trunk/src/contrib/sqoop/doc/controlling-output-format.txt Sun Oct 18 09:28:02 2009
@@ -0,0 +1,38 @@
+
+////
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+////
+
+
+Controlling the Output Format
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+include::output-formatting.txt[]
+
+The following arguments allow you to control the output format of
+records:
+
+include::output-formatting-args.txt[]
+
+For example, we may want to separate records by tab characters, with
+every record surrounded by "double quotes", and internal quote marks
+escaped by a backslash (+\+) character:
+
+----
+$ sqoop --connect jdbc:mysql://database.example.com/employees \
+  --table employee_names --fields-terminated-by \t \
+  --lines-terminated-by \n --enclosed-by '\"' --escaped-by '\\'
+----
+

Added: hadoop/mapreduce/trunk/src/contrib/sqoop/doc/direct.txt
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/contrib/sqoop/doc/direct.txt?rev=826384&view=auto
==============================================================================
--- hadoop/mapreduce/trunk/src/contrib/sqoop/doc/direct.txt (added)
+++ hadoop/mapreduce/trunk/src/contrib/sqoop/doc/direct.txt Sun Oct 18 09:28:02 2009
@@ -0,0 +1,51 @@
+
+////
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+////
+
+
+Direct-mode Imports
+------------------
+
+While the JDBC-based import method used by Sqoop provides it with the
+ability to read from a variety of databases using a generic driver, it
+is not the most high-performance method available. Sqoop can read from
+certain database systems faster by using their built-in export tools.
+
+For example, Sqoop can read from a local MySQL database by using the +mysqldump+
+tool distributed with MySQL. If you run Sqoop on the same machine where a
+MySQL database is present, you can take advantage of this faster
+import method by running Sqoop with the +--direct+ argument. This
+combined with a connect string that begins with +jdbc:mysql://+ will
+inform Sqoop that it should select the faster access method.
+
+If your delimiters exactly match the delimiters used by +mysqldump+,
+then Sqoop will use a fast-path that copies the data directly from
++mysqldump+'s output into HDFS. Otherwise, Sqoop will parse +mysqldump+'s
+output into fields and transcode them into the user-specified delimiter set.
+This incurs additional processing, so performance may suffer.
+For convenience, the +--mysql-delimiters+
+argument will set all the output delimiters to be consistent with
++mysqldump+'s format.
+
+Sqoop also provides a direct-mode backend for PostgreSQL that uses the
++COPY TO STDOUT+ protocol from +psql+. No specific delimiter set provides
+better performance; Sqoop will forward delimiter control arguments to
++psql+.
+
+The "Supported Databases" section provides a full list of database vendors
+which have direct-mode support from Sqoop.
+

Added: hadoop/mapreduce/trunk/src/contrib/sqoop/doc/full-db-import.txt
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/contrib/sqoop/doc/full-db-import.txt?rev=826384&view=auto
==============================================================================
--- hadoop/mapreduce/trunk/src/contrib/sqoop/doc/full-db-import.txt (added)
+++ hadoop/mapreduce/trunk/src/contrib/sqoop/doc/full-db-import.txt Sun Oct 18 09:28:02 2009
@@ -0,0 +1,92 @@
+
+////
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+////
+
+
+Automatic Full-database Import
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+If you want to import all the tables in a database, you can use the
++--all-tables+ command to do so:
+
+----
+$ sqoop --connect jdbc:mysql://database.example.com/employees --all-tables
+----
+
+This will query the database for the available tables, generate an ORM
+class for each table, and run a MapReduce job to import each one.
+Hadoop uses the DBInputFormat to read from a database into a Mapper
+instance. To read a table into a MapReduce program requires creating a
+class to hold the fields of one row of the table. One of the benefits
+of Sqoop is that it generates this class definition for you, based on
+the table definition in the database.
+
+The generated +.java+ files are, by default, placed in the current
+directory. You can supply a different directory with the +--outdir+
+parameter. These are then compiled into +.class+ and +.jar+ files for use
+by the MapReduce job that it launches. These files are created in a
+temporary directory. You can redirect this target with +--bindir+.
+
+Each table will be imported into a separate directory in HDFS, with
+the same name as the table. For instance, if my Hadoop username is
+aaron, the above command would have generated the following
+directories in HDFS:
+
+----
+/user/aaron/employee_names
+/user/aaron/payroll_checks
+/user/aaron/job_descriptions
+/user/aaron/office_supplies
+----
+
+You can change the base directory under which the tables are loaded
+with the +--warehouse-dir+ parameter. For example:
+
+----
+$ sqoop --connect jdbc:mysql://database.example.com/employees --all-tables \
+    --warehouse-dir /common/warehouse
+----
+
+This would create the following directories instead:
+
+----
+/common/warehouse/employee_names
+/common/warehouse/payroll_checks
+/common/warehouse/job_descriptions
+/common/warehouse/office_supplies
+----
+
+By default the data will be read into text files in HDFS. Each of the
+columns will be represented as comma-delimited text. Each row is
+terminated by a newline. See the section on "Controlling the Output
+Format" below for information on how to change these delimiters.
+
+If you want to leverage compression and binary file formats, the
++--as-sequencefile+ argument to Sqoop will import the table
+to a set of SequenceFiles instead. This stores each field of each
+database record in a separate object in a SequenceFile.
+This representation is also likely to be higher performance when used
+as an input to subsequent MapReduce programs as it does not require
+parsing. For completeness, Sqoop provides an +--as-textfile+ option, which is
+implied by default. An +--as-textfile+ on the command-line will override
+a previous +--as-sequencefile+ argument.
+
+The SequenceFile format will embed the records from the database as
+objects using the code generated by Sqoop. It is important that you
+retain the +.java+ file for this class, as you will need to be able to
+instantiate the same type to read the objects back later, in other
+user-defined applications.
+

Added: hadoop/mapreduce/trunk/src/contrib/sqoop/doc/hive.txt
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/contrib/sqoop/doc/hive.txt?rev=826384&view=auto
==============================================================================
--- hadoop/mapreduce/trunk/src/contrib/sqoop/doc/hive.txt (added)
+++ hadoop/mapreduce/trunk/src/contrib/sqoop/doc/hive.txt Sun Oct 18 09:28:02 2009
@@ -0,0 +1,58 @@
+
+////
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+////
+
+
+Importing Data Into Hive
+------------------------
+
+Sqoop's primary function is to upload your data into files in HDFS. If
+you have a Hive metastore associated with your HDFS cluster, Sqoop can
+also import the data into Hive by generating and executing a +CREATE
+TABLE+ statement to define the data's layout in Hive. Importing data
+into Hive is as simple as adding the *+--hive-import+* option to your
+Sqoop command line.
+
+After your data is imported into HDFS, Sqoop will generate a Hive
+script containing a +CREATE TABLE+ operation defining your columns using
+Hive's types, and a +LOAD DATA INPATH+ statement to move the data files
+into Hive's warehouse directory. The script will be executed by
+calling the installed copy of hive on the machine where Sqoop is run.
+If you have multiple Hive installations, or +hive+ is not in your
++$PATH+ use the *+--hive-home+* option to identify the Hive installation
+directory. Sqoop will use +$HIVE_HOME/bin/hive+ from here.
+
+NOTE: This function is incompatible with +--as-sequencefile+.
+
+Hive's text parser does not know how to support escaping or enclosing
+characters. Sqoop will print a warning if you use +--escaped-by+,
++--enclosed-by+, or +--optionally-enclosed-by+ since Hive does not know
+how to parse these. It will pass the field and record terminators through
+to Hive. If you do not set any delimiters and do use +--hive-import+,
+the field delimiter will be set to +^A+ and the record delimiter will
+be set to +\n+ to be consistent with Hive's defaults.
+
+Hive's Type System
+~~~~~~~~~~~~~~~~~~
+
+Hive users will note that there is not a one-to-one mapping between
+SQL types and Hive types. In general, SQL types that do not have a
+direct mapping (e.g., +DATE+, +TIME+, and +TIMESTAMP+) will be coerced to
++STRING+ in Hive. The +NUMERIC+ and +DECIMAL+ SQL types will be coerced to
++DOUBLE+. In these cases, Sqoop will emit a warning in its log messages
+informing you of the loss of precision.
+

Added: hadoop/mapreduce/trunk/src/contrib/sqoop/doc/input-formatting-args.txt
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/contrib/sqoop/doc/input-formatting-args.txt?rev=826384&view=auto
==============================================================================
--- hadoop/mapreduce/trunk/src/contrib/sqoop/doc/input-formatting-args.txt (added)
+++ hadoop/mapreduce/trunk/src/contrib/sqoop/doc/input-formatting-args.txt Sun Oct 18 09:28:02 2009
@@ -0,0 +1,34 @@
+
+////
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+////
+
+
+--input-fields-terminated-by (char)::
+  Sets the input field separator
+
+--input-lines-terminated-by (char)::
+  Sets the input end-of-line char
+
+--input-optionally-enclosed-by (char)::
+  Sets an input field-enclosing character
+
+--input-enclosed-by (char)::
+  Sets a required input field encloser
+
+--input-escaped-by (char)::
+  Sets the input escape character
+

Added: hadoop/mapreduce/trunk/src/contrib/sqoop/doc/input-formatting.txt
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/contrib/sqoop/doc/input-formatting.txt?rev=826384&view=auto
==============================================================================
--- hadoop/mapreduce/trunk/src/contrib/sqoop/doc/input-formatting.txt (added)
+++ hadoop/mapreduce/trunk/src/contrib/sqoop/doc/input-formatting.txt Sun Oct 18 09:28:02 2009
@@ -0,0 +1,24 @@
+
+////
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+////
+
+Record classes generated by Sqoop include both a +toString()+ method
+that formats output records, and a +parse()+ method that interprets
+text based on an input delimiter set. The input delimiters default to
+the same ones chosen for output delimiters, but you can override these
+settings to support converting from one set of delimiters to another.
+

Added: hadoop/mapreduce/trunk/src/contrib/sqoop/doc/intro.txt
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/contrib/sqoop/doc/intro.txt?rev=826384&view=auto
==============================================================================
--- hadoop/mapreduce/trunk/src/contrib/sqoop/doc/intro.txt (added)
+++ hadoop/mapreduce/trunk/src/contrib/sqoop/doc/intro.txt Sun Oct 18 09:28:02 2009
@@ -0,0 +1,34 @@
+
+////
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+////
+
+
+Introduction
+------------
+
+Sqoop is a tool designed to help users of large data import
+existing relational databases into their Hadoop clusters. Sqoop uses
+JDBC to connect to a database, examine each table's schema, and
+auto-generate the necessary classes to import data into HDFS. It
+then instantiates a MapReduce job to read tables from the database
+via the DBInputFormat (JDBC-based InputFormat). Tables are read
+into a set of files loaded into HDFS. Both SequenceFile and
+text-based targets are supported. Sqoop also supports high-performance
+imports from select databases including MySQL.
+
+This document describes how to get started using Sqoop to import
+your data into Hadoop.

Added: hadoop/mapreduce/trunk/src/contrib/sqoop/doc/listing-dbs.txt
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/contrib/sqoop/doc/listing-dbs.txt?rev=826384&view=auto
==============================================================================
--- hadoop/mapreduce/trunk/src/contrib/sqoop/doc/listing-dbs.txt (added)
+++ hadoop/mapreduce/trunk/src/contrib/sqoop/doc/listing-dbs.txt Sun Oct 18 09:28:02 2009
@@ -0,0 +1,35 @@
+
+////
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+////
+
+
+Listing Available Databases
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Once connected to a database server, you can list the available
+databases with the +--list-databases+ parameter. This currently is supported
+only by HSQLDB and MySQL. Note that in this case, the connect string does
+not include a database name, just a server address.
+
+----
+$ sqoop --connect jdbc:mysql://database.example.com/ --list-databases
+information_schema
+employees
+----
+_This only works with HSQLDB and MySQL. A vendor-agnostic implementation of
+this function has not yet been implemented._
+

Added: hadoop/mapreduce/trunk/src/contrib/sqoop/doc/listing-tables.txt
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/contrib/sqoop/doc/listing-tables.txt?rev=826384&view=auto
==============================================================================
--- hadoop/mapreduce/trunk/src/contrib/sqoop/doc/listing-tables.txt (added)
+++ hadoop/mapreduce/trunk/src/contrib/sqoop/doc/listing-tables.txt Sun Oct 18 09:28:02 2009
@@ -0,0 +1,34 @@
+
+////
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+////
+
+
+Listing Available Tables
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+Within a database, you can list the tables available for import with
+the +--list-tables+ command. The following example shows four tables available
+within the "employees" example database:
+
+----
+$ sqoop --connect jdbc:mysql://database.example.com/employees --list-tables
+employee_names
+payroll_checks
+job_descriptions
+office_supplies
+----
+

Added: hadoop/mapreduce/trunk/src/contrib/sqoop/doc/misc-args.txt
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/contrib/sqoop/doc/misc-args.txt?rev=826384&view=auto
==============================================================================
--- hadoop/mapreduce/trunk/src/contrib/sqoop/doc/misc-args.txt (added)
+++ hadoop/mapreduce/trunk/src/contrib/sqoop/doc/misc-args.txt Sun Oct 18 09:28:02 2009
@@ -0,0 +1,32 @@
+
+////
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+////
+
+
+Miscellaneous Additional Arguments
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+If you want to generate the Java classes to represent tables without
+actually performing an import, supply a connect string and
+(optionally) credentials as above, as well as +--all-tables+ or
++--table+, but also use the +--generate-only+ argument. This will
+generate the classes and cease further operation.
+
+You can override the +$HADOOP_HOME+ environment variable within Sqoop
+with the +--hadoop-home+ argument. You can override the +$HIVE_HOME+
+environment variable with +--hive-home+.
+

Added: hadoop/mapreduce/trunk/src/contrib/sqoop/doc/output-formatting-args.txt
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/contrib/sqoop/doc/output-formatting-args.txt?rev=826384&view=auto
==============================================================================
--- hadoop/mapreduce/trunk/src/contrib/sqoop/doc/output-formatting-args.txt (added)
+++ hadoop/mapreduce/trunk/src/contrib/sqoop/doc/output-formatting-args.txt Sun Oct 18 09:28:02 2009
@@ -0,0 +1,39 @@
+
+////
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+////
+
+--fields-terminated-by (char)::
+  Sets the field separator character
+
+--lines-terminated-by (char)::
+  Sets the end-of-line character
+
+--optionally-enclosed-by (char)::
+  Sets a field-enclosing character which may be used if a
+  value contains delimiter characters.
+
+--enclosed-by (char)::
+  Sets a field-enclosing character which will be used for all fields.
+
+--escaped-by (char)::
+  Sets the escape character
+
+--mysql-delimiters::
+Uses MySQL's default delimiter set:
++
+fields: ,  lines: \n  escaped-by: \  optionally-enclosed-by: '
+

Added: hadoop/mapreduce/trunk/src/contrib/sqoop/doc/output-formatting.txt
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/contrib/sqoop/doc/output-formatting.txt?rev=826384&view=auto
==============================================================================
--- hadoop/mapreduce/trunk/src/contrib/sqoop/doc/output-formatting.txt (added)
+++ hadoop/mapreduce/trunk/src/contrib/sqoop/doc/output-formatting.txt Sun Oct 18 09:28:02 2009
@@ -0,0 +1,44 @@
+
+////
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+////
+
+The delimiters used to separate fields and records can be specified
+on the command line, as can a quoting character and an escape character
+(for quoting delimiters inside a values). Data imported with
++--as-textfile+ will be formatted according to these parameters. Classes
+generated by Sqoop will encode this information, so using +toString()+
+from a data record stored +--as-sequencefile+ will reproduce your
+specified formatting.
+
+The +(char)+ argument for each argument in this section can be specified
+either as a normal character (e.g., +--fields-terminated-by ,+) or via
+an escape sequence. Arguments of the form +\0xhhh+ will be interpreted
+as a hexidecimal representation of a character with hex number _hhh_.
+Arguments of the form +\0ooo+ will be treated as an octal representation
+of a character represented by octal number _ooo_. The special escapes
++\n+, +\r+, +\"+, +\b+, +\t+, and +\\+ act as they do inside Java strings. +\0+ will be
+treated as NUL. This will insert NUL characters between fields or lines
+(if used for +--fields-terminated-by+ or +--lines-terminated-by+), or will
+disable enclosing/escaping if used for one of the +--enclosed-by+,
++--optionally-enclosed-by+, or +--escaped-by+ arguments.
+
+The default delimiters are +,+ for fields, +\n+ for records, no quote
+character, and no escape character. Note that this can lead to
+ambiguous/unparsible records if you import database records containing
+commas or newlines in the field data. For unambiguous parsing, both must
+be enabled, e.g., via +--mysql-delimiters+.
+

Added: hadoop/mapreduce/trunk/src/contrib/sqoop/doc/supported-dbs.txt
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/contrib/sqoop/doc/supported-dbs.txt?rev=826384&view=auto
==============================================================================
--- hadoop/mapreduce/trunk/src/contrib/sqoop/doc/supported-dbs.txt (added)
+++ hadoop/mapreduce/trunk/src/contrib/sqoop/doc/supported-dbs.txt Sun Oct 18 09:28:02 2009
@@ -0,0 +1,55 @@
+
+////
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+////
+
+
+Supported Databases
+-------------------
+
+Sqoop uses JDBC to connect to databases. JDBC is a compatibility layer
+that allows a program to access many different databases through a common
+API. Slight differences in the SQL language spoken by each database, however,
+may mean that Sqoop can't use every database out of the box, or that some
+databases may be used in an inefficient manner.
+
+When you provide a connect string to Sqoop, it inspects the protocol scheme to
+determine appropriate vendor-specific logic to use. If Sqoop knows about
+a given database, it will work automatically. If not, you may need to
+specify the driver class to load via +--driver+. This will use a generic
+code path which will use standard SQL to access the database. Sqoop provides
+some databases with faster, non-JDBC-based access mechanisms. These can be
+enabled by specfying the +--direct+ parameter.
+
+Sqoop includes vendor-specific code paths for the following databases:
+
+[grid="all"]
+`-----------`--------`--------------------`---------------------
+Database    version  +--direct+ support?  connect string matches
+----------------------------------------------------------------
+HSQLDB      1.8.0+   No                   +jdbc:hsqldb:*//+
+MySQL       5.0+     Yes                  +jdbc:mysql://+
+Oracle      10.2.0+  No                   +jdbc:oracle:*//+
+PostgreSQL  8.3+     Yes                  +jdbc:postgresql://+
+----------------------------------------------------------------
+
+Sqoop may work with older versions of the databases listed, but we have
+only tested it with the versions specified above.
+
+Even if Sqoop supports a database internally, you may still need to
+install the database vendor's JDBC driver in your +$HADOOP_HOME/lib+
+path.
+

Added: hadoop/mapreduce/trunk/src/contrib/sqoop/doc/table-import.txt
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/contrib/sqoop/doc/table-import.txt?rev=826384&view=auto
==============================================================================
--- hadoop/mapreduce/trunk/src/contrib/sqoop/doc/table-import.txt (added)
+++ hadoop/mapreduce/trunk/src/contrib/sqoop/doc/table-import.txt Sun Oct 18 09:28:02 2009
@@ -0,0 +1,68 @@
+
+////
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+////
+
+
+Importing Individual Tables
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+In addition to full-database imports, Sqoop will allow you to import
+individual tables. Instead of using +--all-tables+, specify the name of
+a particular table with the +--table+ argument:
+
+----
+$ sqoop --connect jdbc:mysql://database.example.com/employees \
+    --table employee_names
+----
+
+You can further specify a subset of the columns in a table by using
+the +--columns+ argument. This takes a list of column names, delimited
+by commas, with no spaces in between. e.g.:
+
+----
+$ sqoop --connect jdbc:mysql://database.example.com/employees \
+    --table employee_names --columns employee_id,first_name,last_name,dept_id
+----
+
+Sqoop will use a MapReduce job to read sections of the table in
+parallel. For the MapReduce tasks to divide the table space, the
+results returned by the database must be orderable. Sqoop will
+automatically detect the primary key for a table and use that to order
+the results. If no primary key is available, or (less likely) you want
+to order the results along a different column, you can specify the
+column name with +--split-by+.
+
+.Row ordering
+IMPORTANT:  To guarantee correctness of your input, you must select an
+ordering column for which each row has a unique value. If duplicate
+values appear in the ordering column, the results of the import are
+undefined, and Sqoop will not be able to detect the error.
+
+Finally, you can control which rows of a table are imported via the
++--where+ argument. With this argument, you may specify a clause to be
+appended to the SQL statement used to select rows from the table,
+e.g.:
+
+----
+$ sqoop --connect jdbc:mysql://database.example.com/employees \
+  --table employee_names --where "employee_id > 40 AND active = 1"
+----
+
+The +--columns+, +--split-by+, and +--where+ arguments are incompatible with
++--all-tables+. If you require special handling for some of the tables,
+then you must manually run a separate import job for each table.
+

Added: hadoop/mapreduce/trunk/src/contrib/sqoop/readme.txt
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/contrib/sqoop/readme.txt?rev=826384&view=auto
==============================================================================
--- hadoop/mapreduce/trunk/src/contrib/sqoop/readme.txt (added)
+++ hadoop/mapreduce/trunk/src/contrib/sqoop/readme.txt Sun Oct 18 09:28:02 2009
@@ -0,0 +1,15 @@
+Sqoop documentation is in the doc/ directory in asciidoc format.
+
+Run 'ant doc' to build the documentation. It will be created in
+$HADOOP_HOME/build/contrib/sqoop/doc.
+
+There will be a manpage (sqoop.1.gz) and a User Guide formatted in HTML.
+
+This process requires the following programs:
+  asciidoc
+  gzip
+  make
+  python 2.5+
+  xmlto
+
+For more information about asciidoc, see http://www.methods.co.nz/asciidoc/



Mime
View raw message