Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "Hive/HBaseBulkLoad" page has been changed by JohnSichi. http://wiki.apache.org/hadoop/Hive/HBaseBulkLoad?action=diff&rev1=9&rev2=10 -------------------------------------------------- The second command populates it (using the sampling query previously defined). Usage of ORDER BY guarantees that a single file will be produced in directory {{{/tmp/hb_range_keys}}}. The filename is unknown, but it is necessary to reference the file by name later, so run a command such as the following to copy it to a specific name: {{{ - dfs -cp /tmp/hb_range_keys/* /tmp/hb_range_key_list + dfs -cp /tmp/hb_range_keys/* /tmp/hb_range_key_list; }}} = Prepare Staging Location = The sort is going to produce a lot of data, so make sure you have sufficient space in your HDFS cluster, and choose the location where the files will be staged. We'll use {{{/tmp/hbsort}}} in this example. + + The directory does not actually need to exist (it will be automatically created in the next step), but if it does exist, it should be empty. + + {{{ + dfs -rmr /tmp/hbsort; + dfs -mkdir /tmp/hbsort; + }}} = Sort Data =