hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Hbase scaning for couple Terabytes data
Date Thu, 12 May 2016 04:01:36 GMT
TableInputFormatBase is abstract.

Most likely you would use TableInputFormat for the scan.

See javadoc of getSplits():

   * Calculates the splits that will serve as input for the map tasks. The

   * number of splits matches the number of regions in a table.


On Wed, May 11, 2016 at 6:05 PM, Yi Jiang <yi.jiang@ubisoft.com> wrote:

> Hi, Guys
> Recently we are debating the usage for hbase as our destination for data
> pipeline job.
> Basically, we want to save our logs into hbase, and our pipeline can
> generate 2-4 terabytes data everyday, but our IT department think it is not
> good idea to scan so hbase, it will cause the performance and memory issue.
> And they ask our just keep 15 minutes data amount in the hbase for real
> time analysis.
> For now, I am using hive to external to hbase, but what I am thinking that
> for map reduce job, what kind of mapper it is using to scan the data from
> hbase? Is it TableInputFormatBase? and how many mapper it will use in hive
> to scan the hbase. Is it efficient or not? Will it cause the performance
> issue if we have couple T's or more larger data amount?
> I am also trying to index some columns that we might use to query. But  I
> am not sure if it is good idea to keep so much history data in the hbase
> for query.
> Thank you
> Jacky

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message