hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: How spark writes to HBASE
Date Mon, 22 Jan 2018 16:57:12 GMT
For case 1, HFile would be loaded into the region (via staging directory).

Please see:
http://hbase.apache.org/book.html#arch.bulk.load

On Mon, Jan 22, 2018 at 8:52 AM, vignesh <vignesh093@gmail.com> wrote:

> If it is a bulk load I use spark hbase connector provided by hortonworks.
> For time series writes I use normal hbase client API's.
>
> So does that mean in case 2(client API write)  the write to memstore will
> happen via network? In case 1(bulk load)the HFile will be moved to the
> region which is supposed to hold or it will write to local and keep that as
> a copy and the second replication would go to that particular region?
>
> On Jan 22, 2018 22:16, "Ted Yu" <yuzhihong@gmail.com> wrote:
>
> Which connector do you use to perform the write ?
>
> bq. Or spark will wisely launch an executor on that machine
>
> I don't think that is the case. Multiple writes may be performed which
> would end up on different region servers. Spark won't provide the affinity
> described above.
>
> On Mon, Jan 22, 2018 at 7:19 AM, vignesh <vignesh093@gmail.com> wrote:
>
> > Hi,
> >
> > I have a Spark job which reads some timeseries data and pushes that to
> > HBASE using HBASE client API. I am executing this Spark job on a 10
> > node cluster. Say at first when spark kicks off it picks
> > machine1,machine2,machine3 as its executors. Now when the job inserts
> > a row to HBASE. Below is what my undersatnding on what it does.
> >
> > Based on the row key a particular region(from the META) would be
> > chosen and that row will be pushed to that RegionServer's memstore and
> > WAL and once the memestore is full it would be flushed to the disk.Now
> > if assume a particular row is being processed by a executor on
> > machine2 and the regionserver which handles that region to which the
> > put is to be made is on machine6. Will the data be transferred from
> > machine2 to machine6 over network and then the data will be stored in
> > memstore of machine6. Or spark will wisely launch an executor on that
> > machine during write(if the dynamic allocation is turned on) and
> > pushes to it?
> >
> >
> > --
> > I.VIGNESH
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message