hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kennedy, Sean C." <sean.kenn...@merck.com>
Subject hbase importtsv
Date Thu, 01 May 2014 22:34:23 GMT
I ran the following command to import an excel.csv file into hbase. Everything looked ok however
when I ran a scan on the table in hbase I did not see as many rows as were in excel.csv file.

Any help appreciated....



/hd/hadoop/bin/hadoop jar /hbase/hbase-0.94.15/hbase-0.94.15.jar importtsv '-Dimporttsv.separator=,'
-Dimporttsv.columns=HBASE_ROW_KEY,ROOT,NODE,VALUE,X_PATH,IMG,NODE_URL,LFLAG,SORT_ORDER,SITE
V_MES_INPUT_TREE /ma/segwhdfs/hpp/hbase/MES/csv/MES_INPUT_TREE


The csv file had over 200,000 rows, however my hbase scan returned only 3500 or so rows.

Output from scan 'MES_INPUT_TREE'

3855 row(s) in 5.6090 seconds


Output from job:

4/05/01 17:58:53 INFO mapred.JobClient: Job complete: job_201405011721_0001
14/05/01 17:58:53 INFO mapred.JobClient: Counters: 20
14/05/01 17:58:53 INFO mapred.JobClient:   Job Counters
14/05/01 17:58:53 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=1208423
14/05/01 17:58:53 INFO mapred.JobClient:     Total time spent by all reduces waiting after
reserving slots (ms)=0
14/05/01 17:58:53 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving
slots (ms)=0
14/05/01 17:58:53 INFO mapred.JobClient:     Rack-local map tasks=1
14/05/01 17:58:53 INFO mapred.JobClient:     Launched map tasks=4
14/05/01 17:58:53 INFO mapred.JobClient:     Data-local map tasks=3
14/05/01 17:58:53 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=1427
14/05/01 17:58:53 INFO mapred.JobClient:   ImportTsv
14/05/01 17:58:53 INFO mapred.JobClient:     Bad Lines=3
14/05/01 17:58:53 INFO mapred.JobClient:   File Output Format Counters
14/05/01 17:58:53 INFO mapred.JobClient:     Bytes Written=0
14/05/01 17:58:53 INFO mapred.JobClient:   FileSystemCounters
14/05/01 17:58:53 INFO mapred.JobClient:     HDFS_BYTES_READ=5243015
14/05/01 17:58:53 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=80374
14/05/01 17:58:53 INFO mapred.JobClient:   File Input Format Counters
14/05/01 17:58:53 INFO mapred.JobClient:     Bytes Read=5242880
14/05/01 17:58:53 INFO mapred.JobClient:   Map-Reduce Framework
14/05/01 17:58:53 INFO mapred.JobClient:     Map input records=22494
14/05/01 17:58:53 INFO mapred.JobClient:     Physical memory (bytes) snapshot=112275456
14/05/01 17:58:53 INFO mapred.JobClient:     Spilled Records=0
14/05/01 17:58:53 INFO mapred.JobClient:     CPU time spent (ms)=2430
14/05/01 17:58:53 INFO mapred.JobClient:     Total committed heap usage (bytes)=145752064
14/05/01 17:58:53 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=769548288
14/05/01 17:58:53 INFO mapred.JobClient:     Map output records=22491
14/05/01 17:58:53 INFO mapred.JobClient:     SPLIT_RAW_BYTES=135
Notice:  This e-mail message, together with any attachments, contains
information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station,
New Jersey, USA 08889), and/or its affiliates Direct contact information
for affiliates is available at 
http://www.merck.com/contact/contacts.html) that may be confidential,
proprietary copyrighted and/or legally privileged. It is intended solely
for the use of the individual or entity named on this message. If you are
not the intended recipient, and have received this message in error,
please notify us immediately by reply e-mail and then delete it from 
your system.

Mime
View raw message