hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jaime Solano <jdjsol...@gmail.com>
Subject Re: Streaming data to htable
Date Fri, 13 Feb 2015 16:37:11 GMT
Hi Andrey,

We're facing a similar situation, where we plan to load a lot of data into
HBase direclty. We considered writing the Hfiles without MapReduce. Is this
something you've done in the past? Are there any sample codes we could use
as guide? On another side, what would you consider "big enough" to switch
from regular Puts to HFiles-writing?

Thanks!

On Fri Feb 13 2015 at 10:58:28 Andrey Stepachev <octo47@gmail.com> wrote:

> Hi hongbin,
>
> It seems that depend on how many data you ingest.
> In case of big enough I'd look at creating HFiles
> directly without mapreduce (for example using
> HFileOutputFormat without mapreduce or using
> HFileWriter directly).
> Created files can be imported by LoadIncrementalHFiles#doBulkLoad
> directly into hbase. And need to be sure that your
> regions will not split too fast, bulk load can load only
> hfiles which contain keyvalues from one or two adjust
> regions. (better if splits are disabled and did externally)
>
> But you need to be sure that you actually need
> to do such micromanagement and not just stick
> with regular Puts. HBase can sustain quite good
> amount of input data to start worry about.
>
> Cheers.
>
> On Fri, Feb 13, 2015 at 6:20 AM, hongbin ma <mahongbin@apache.org> wrote:
>
> > hi,
> >
> > I'm trying to use a htable to store data that comes in a streaming
> fashion.
> > The streaming in data is guaranteed to have a larger KEY than ANY
> existing
> > keys in the table.
> > And the data will be READONLY.
> >
> > The data is streaming in at a very high rate, I don't want to issue a PUT
> > operation for each data entry, because obviously it is poor in
> performance.
> > I'm thinking about pooling the data entries and flush them to hbase every
> > five minutes, and I AFAIK there're few options:
> >
> > 1.  Pool the data entries, and every 5 minute run a MR job to convert the
> > data to hfile format. This approach could avoid the overhead of single
> PUT,
> > but I'm afraid the MR job might be too costly( waiting in the job queue)
> to
> > keep in pace.
> >
> > 2. Use HtableInterface.put(List<Put>) the batched version should be
> faster,
> > but I'm not quite sure how much.
> >
> > 3.?
> >
> > can anyone give me some advice on this?
> > thanks!
> >
> > hongbin
> >
>
>
>
> --
> Andrey.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message