hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From edward choi <mp2...@gmail.com>
Subject Inserting many small files into HBase
Date Mon, 21 Mar 2011 01:39:14 GMT

I'm planning to crawl thousands of news rss feeds via MapReduce, and save
each news article into HBase directly.

My concern is that Hadoop does not work well with a large number of
small-size files,

and if I insert every single news article (which is small-size apparently)
into HBase, (without separately storing it into HDFS)

I might end up with millions of files that are only several kilobytes in

Or does HBase somehow automatically append each news article into a single
file, so that it would have only a few files of large-size?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message