atlas-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shwetha Shivalingamurthy <sshivalingamur...@hortonworks.com>
Subject Re: Atlas and sqoop hook
Date Tue, 21 Jun 2016 08:33:38 GMT
Make sure you have latest sqoop, built from sqoop trunk.

Use ‹verbose in sqoop command to get debug logs. From the logs, you can
verify if sqoop atlas hook is invoked. Also make sure that the sqoop hook
picks the right atlas conf

Regards,
Shwetha






On 20/06/16, 11:58 PM, "Margus Roo" <margus@roo.ee> wrote:

>Hi
>
>Downloaded and compiled atlas 0.7.
>Hive hook is working - create table [tablename] as select * from [src
>tablename] is working and data lineage is generated in atlas.
>Next I tried sqoop hook and followed
>http://atlas.incubator.apache.org/Bridge-Sqoop.html
>
>Command:
>sqoop-import --connect jdbc:mysql://mysqlhost/test --table sqoop_test
>--split-by id --hive-import -hive-table sqoop_test19 --username margusja
>--P
>creates a new table in Hive and new table is in atlas also but no data
>lineage
>
>I see from 
>http://hortonworks.com/hadoop-tutorial/cross-component-lineage-apache-atla
>s/
>There I can see that extra config parameters are loaded (in picture
>https://raw.githubusercontent.com/hortonworks/tutorials/atlas-ranger-tp/as
>sets/cross-component-lineage-with-atlas/8-sqoop-import-finish.png)
>and kafka producer creating ouutput but in my command:
>sqoop-import --connect jdbc:mysql://mysqlhost/test --table sqoop_test
>--split-by id --hive-import -hive-table sqoop_test19 --username margusja
>--P
>there is no extra output only:
>
>Warning: /usr/hdp/2.4.0.0-169/accumulo does not exist! Accumulo imports
>will fail.
>Please set $ACCUMULO_HOME to the root of your Accumulo installation.
>16/06/20 21:25:47 INFO sqoop.Sqoop: Running Sqoop version:
>1.4.6.2.4.0.0-169
>16/06/20 21:25:47 WARN tool.BaseSqoopTool: Setting your password on the
>command-line is insecure. Consider using -P instead.
>16/06/20 21:25:47 INFO tool.BaseSqoopTool: Using Hive-specific
>delimiters for output. You can override
>16/06/20 21:25:47 INFO tool.BaseSqoopTool: delimiters with
>--fields-terminated-by, etc.
>16/06/20 21:25:47 INFO manager.MySQLManager: Preparing to use a MySQL
>streaming resultset.
>16/06/20 21:25:47 INFO tool.CodeGenTool: Beginning code generation
>16/06/20 21:25:47 INFO manager.SqlManager: Executing SQL statement:
>SELECT t.* FROM `sqoop_test` AS t LIMIT 1
>16/06/20 21:25:47 INFO manager.SqlManager: Executing SQL statement:
>SELECT t.* FROM `sqoop_test` AS t LIMIT 1
>16/06/20 21:25:47 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is
>/usr/hdp/2.4.0.0-169/hadoop-mapreduce
>Note: 
>/tmp/sqoop-root/compile/49b525e14ebd68542d86b68dc399bd84/sqoop_test.java
>uses or overrides a deprecated API.
>Note: Recompile with -Xlint:deprecation for details.
>16/06/20 21:25:48 INFO orm.CompilationManager: Writing jar file:
>/tmp/sqoop-root/compile/49b525e14ebd68542d86b68dc399bd84/sqoop_test.jar
>16/06/20 21:25:48 WARN manager.MySQLManager: It looks like you are
>importing from mysql.
>16/06/20 21:25:48 WARN manager.MySQLManager: This transfer can be
>faster! Use the --direct
>16/06/20 21:25:48 WARN manager.MySQLManager: option to exercise a
>MySQL-specific fast path.
>16/06/20 21:25:48 INFO manager.MySQLManager: Setting zero DATETIME
>behavior to convertToNull (mysql)
>16/06/20 21:25:48 INFO mapreduce.ImportJobBase: Beginning import of
>sqoop_test
>SLF4J: Class path contains multiple SLF4J bindings.
>SLF4J: Found binding in
>[jar:file:/usr/hdp/2.4.0.0-169/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/sl
>f4j/impl/StaticLoggerBinder.class]
>SLF4J: Found binding in
>[jar:file:/usr/hdp/2.4.0.0-169/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/
>slf4j/impl/StaticLoggerBinder.class]
>SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>explanation.
>SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>16/06/20 21:25:50 INFO impl.TimelineClientImpl: Timeline service
>address: http://bigdata21.webmedia.int:8188/ws/v1/timeline/
>16/06/20 21:25:50 INFO client.RMProxy: Connecting to ResourceManager at
>bigdata21.webmedia.int/192.168.81.110:8050
>16/06/20 21:25:52 INFO db.DBInputFormat: Using read commited transaction
>isolation
>16/06/20 21:25:52 INFO db.DataDrivenDBInputFormat: BoundingValsQuery:
>SELECT MIN(`id`), MAX(`id`) FROM `sqoop_test`
>16/06/20 21:25:52 INFO mapreduce.JobSubmitter: number of splits:2
>16/06/20 21:25:52 INFO mapreduce.JobSubmitter: Submitting tokens for
>job: job_1460979043517_0118
>16/06/20 21:25:53 INFO impl.YarnClientImpl: Submitted application
>application_1460979043517_0118
>16/06/20 21:25:53 INFO mapreduce.Job: The url to track the job:
>http://bigdata21.webmedia.int:8088/proxy/application_1460979043517_0118/
>16/06/20 21:25:53 INFO mapreduce.Job: Running job: job_1460979043517_0118
>16/06/20 21:25:58 INFO mapreduce.Job: Job job_1460979043517_0118 running
>in uber mode : false
>16/06/20 21:25:58 INFO mapreduce.Job:  map 0% reduce 0%
>16/06/20 21:26:02 INFO mapreduce.Job:  map 50% reduce 0%
>16/06/20 21:26:03 INFO mapreduce.Job:  map 100% reduce 0%
>16/06/20 21:26:03 INFO mapreduce.Job: Job job_1460979043517_0118
>completed successfully
>16/06/20 21:26:03 INFO mapreduce.Job: Counters: 30
>         File System Counters
>                 FILE: Number of bytes read=0
>                 FILE: Number of bytes written=310818
>                 FILE: Number of read operations=0
>                 FILE: Number of large read operations=0
>                 FILE: Number of write operations=0
>                 HDFS: Number of bytes read=197
>                 HDFS: Number of bytes written=20
>                 HDFS: Number of read operations=8
>                 HDFS: Number of large read operations=0
>                 HDFS: Number of write operations=4
>         Job Counters
>                 Launched map tasks=2
>                 Other local map tasks=2
>                 Total time spent by all maps in occupied slots (ms)=4353
>                 Total time spent by all reduces in occupied slots (ms)=0
>                 Total time spent by all map tasks (ms)=4353
>                 Total vcore-seconds taken by all map tasks=4353
>                 Total megabyte-seconds taken by all map tasks=2785920
>         Map-Reduce Framework
>                 Map input records=2
>                 Map output records=2
>                 Input split bytes=197
>                 Spilled Records=0
>                 Failed Shuffles=0
>                 Merged Map outputs=0
>                 GC time elapsed (ms)=70
>                 CPU time spent (ms)=1780
>                 Physical memory (bytes) snapshot=355676160
>                 Virtual memory (bytes) snapshot=4937265152
>                 Total committed heap usage (bytes)=154140672
>         File Input Format Counters
>                 Bytes Read=0
>         File Output Format Counters
>                 Bytes Written=20
>16/06/20 21:26:03 INFO mapreduce.ImportJobBase: Transferred 20 bytes in
>13.8509 seconds (1.444 bytes/sec)
>16/06/20 21:26:03 INFO mapreduce.ImportJobBase: Retrieved 2 records.
>16/06/20 21:26:03 INFO manager.SqlManager: Executing SQL statement:
>SELECT t.* FROM `sqoop_test` AS t LIMIT 1
>16/06/20 21:26:03 INFO hive.HiveImport: Loading uploaded data into Hive
>
>Logging initialized using configuration in
>jar:file:/usr/hdp/2.4.0.0-169/hive/lib/hive-common-1.2.1000.2.4.0.0-169.ja
>r!/hive-log4j.properties
>OK
>Time taken: 2.035 seconds
>Loading data to table default.sqoop_test19
>Table default.sqoop_test19 stats: [numFiles=4, totalSize=40]
>OK
>Time taken: 1.043 seconds
>
>I suspect that maybe atlas-application.properties or sqoop-site.xml is
>not read during the sqoop import command. How to debug it?
>
>-- 
>Margus (margusja) Roo
>http://margus.roo.ee
>skype: margusja
>+372 51 48 780
>
>


Mime
View raw message