hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: How spark writes to HBASE
Date Mon, 22 Jan 2018 16:46:37 GMT
Which connector do you use to perform the write ?

bq. Or spark will wisely launch an executor on that machine

I don't think that is the case. Multiple writes may be performed which
would end up on different region servers. Spark won't provide the affinity
described above.

On Mon, Jan 22, 2018 at 7:19 AM, vignesh <vignesh093@gmail.com> wrote:

> Hi,
> I have a Spark job which reads some timeseries data and pushes that to
> HBASE using HBASE client API. I am executing this Spark job on a 10
> node cluster. Say at first when spark kicks off it picks
> machine1,machine2,machine3 as its executors. Now when the job inserts
> a row to HBASE. Below is what my undersatnding on what it does.
> Based on the row key a particular region(from the META) would be
> chosen and that row will be pushed to that RegionServer's memstore and
> WAL and once the memestore is full it would be flushed to the disk.Now
> if assume a particular row is being processed by a executor on
> machine2 and the regionserver which handles that region to which the
> put is to be made is on machine6. Will the data be transferred from
> machine2 to machine6 over network and then the data will be stored in
> memstore of machine6. Or spark will wisely launch an executor on that
> machine during write(if the dynamic allocation is turned on) and
> pushes to it?
> --

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message