hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bijieshan <bijies...@huawei.com>
Subject RE: Why so many unexpected files like partitions_xxxx are created?
Date Tue, 17 Dec 2013 02:53:20 GMT
>  I think I should delete these files immediately after I have finished bulk loading data
into HBase since they are useless at that time, right ?

Ya. I think so. They are useless once bulk load task finished.

Jieshan.
-----Original Message-----
From: Tao Xiao [mailto:xiaotao.cs.nju@gmail.com] 
Sent: Tuesday, December 17, 2013 9:34 AM
To: user@hbase.apache.org
Subject: Re: Why so many unexpected files like partitions_xxxx are created?

Indeed these files are produced by org.apache.hadoop.hbase.mapreduce.
LoadIncrementalHFiles in the directory specified by what
job.getWorkingDirectory()
returns, and I think I should delete these files immediately after I have finished bulk loading
data into HBase since they are useless at that time, right ?




2013/12/16 Bijieshan <bijieshan@huawei.com>

> The reduce partition information is stored in this partition_XXXX file.
> See the below code:
>
> HFileOutputFormat#configureIncrementalLoad:
>         .....................
>     Path partitionsPath = new Path(job.getWorkingDirectory(),
>                                    "partitions_" + UUID.randomUUID());
>     LOG.info("Writing partition information to " + partitionsPath);
>
>     FileSystem fs = partitionsPath.getFileSystem(conf);
>     writePartitions(conf, partitionsPath, startKeys);
>         .....................
>
> Hoping it helps.
>
> Jieshan
> -----Original Message-----
> From: Tao Xiao [mailto:xiaotao.cs.nju@gmail.com]
> Sent: Monday, December 16, 2013 6:48 PM
> To: user@hbase.apache.org
> Subject: Why so many unexpected files like partitions_xxxx are created?
>
> I imported data into HBase in the fashion of bulk load,  but after 
> that I found many unexpected file were created in the HDFS directory 
> of /user/root/, and they like these:
>
> /user/root/partitions_fd74866b-6588-468d-8463-474e202db070
> /user/root/partitions_fd867cd2-d9c9-48f5-9eec-185b2e57788d
> /user/root/partitions_fda37b8a-a882-4787-babc-8310a969f85c
> /user/root/partitions_fdaca2f4-2792-41f6-b7e8-61a8a5677dea
> /user/root/partitions_fdd55baa-3a12-493e-8844-a23ae83209c5
> /user/root/partitions_fdd85a3c-9abe-45d4-a0c6-76d2bed88ea5
> /user/root/partitions_fe133460-5f3f-4c6a-9fff-ff6c62410cc1
> /user/root/partitions_fe29a2b0-b281-465f-8d4a-6044822d960a
> /user/root/partitions_fe2fa6fa-9066-484c-bc91-ec412e48d008
> /user/root/partitions_fe31667b-2d5a-452e-baf7-a81982fe954a
> /user/root/partitions_fe3a5542-bc4d-4137-9d5e-1a0c59f72ac3
> /user/root/partitions_fe6a9407-c27b-4a67-bb50-e6b9fd172bc9
> /user/root/partitions_fe6f9294-f970-473c-8659-c08292c27ddd
> ... ...
> ... ...
>
>
> It seems that they are HFiles, but I don't know why the were created here?
>
> I bulk load data into HBase in the following way:
>
> Firstly,   I wrote a MapReduce program which only has map tasks. The map
> tasks read some text data and emit them in the form of  RowKey and 
> KeyValue.The following is my program:
>
>         @Override
>         protected void map(NullWritable NULL, GtpcV1SignalWritable 
> signal, Context ctx) throws InterruptedException, IOException {
>             String strRowkey = xxx;
>             byte[] rowkeyBytes = Bytes.toBytes(strRowkey);
>
>             rowkey.set(rowkeyBytes);
>
>             part1.init(signal);
>             part2.init(signal);
>
>             KeyValue kv = new KeyValue(rowkeyBytes, Family_A, 
> Qualifier_Q, part1.serialize());
>             ctx.write(rowkey, kv);
>
>             kv = new KeyValue(rowkeyBytes, Family_B, Qualifier_Q, 
> part2.serialize());
>             ctx.write(rowkey, kv);
>         }
>
>
> after the MR programs finished, there were several HFiles generated in 
> the output directory I specified.
>
> Then I bean to load these HFiles into HBase using the following command:
>        hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles
> HFiles-Dir  MyTable
>
> Finally , I could see that the data were indeed loaded into the table 
> in HBase.
>
>
> But, I could also see that there were many unexpected files generated 
> in the HDFS directory of  /user/root/,  just as I have mentioned at 
> the begining of this mail,  and I did not specify any files to be 
> produced in this directory.
>
> What happened ? Who can tell me what there files are and who produced them?
>
> Thanks
>
Mime
View raw message