hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Fwd: bulk load doubts
Date Tue, 21 Jul 2015 14:29:15 GMT
For #1, with HDFS replication set to 3, HFile replication is handled by
hdfs. There shouldn't be HFile loss once bulk load completes.

For #3, multiple HFiles may be generated per region.

bq. If multiple does loadIncrementalHFiles merges these Hfiles to 1

There is no merging of HFiles in bulk load.

For #4, frequent compactions are likely given the small size of bulk loaded


On Tue, Jul 21, 2015 at 7:20 AM, Shushant Arora <shushantarora09@gmail.com>

> 1.Does bulk loaded HFile not  get replicated? Is it mean if a Regionserver
> gets down , all Hfiles which were bulk loaded to this server are lost
> irrespective of HDFS replication set to 3 ? if yes- Why bulk loaded HFiles
> are not replicated.
> 2.Is there any issue in timestamp prefix as key of table- and used bulk
> load for writing.
> 3.Does in bulk load MR job using HFileOutPutFormat2 as outputformat will
> create single HFile per region ? Or it can be multiple Hfiles per region?
> If multiple does loadIncrementalHFiles merges these Hfiles to 1 while
> loading to same region or just do simple copy?
> 4.Is there any performance issue if I run bulk load every 5 sec -
> containing ~20MB of data.Does it  creates frequent compactions and that
> lead to performance issue?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message