hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gan, Xiyun" <ganxi...@gmail.com>
Subject ImportTsv usage
Date Thu, 07 Apr 2011 02:54:20 GMT
   I need to use bulk load functionality in HBase. I have read the
documentation on HBase wiki page, but the ImportTsv tool does not meet my
need, so I added some code to the map() function in ImportTsv.java.
Originally, that map() function writes only one key/value pair to the
context. In my modified code, the function writes two key/value pairs to
context, the rest code remains the same as the originally one.
   I complied my code, using hadoop jar to run. But I find the time cost to
run the job is not twice as much as the original one, it's nearly ten times
as much as the one only emit one key/value pair. I checked my code, and I
did not find any problem. If the map() function emits either of the two
key/value pairs I wrote, the time cost becomes normal.
  What's the cause? Do I miss any tips in bulk load?

Best wishes
Gan, Xiyun

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message