hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Birdsall <dave.birds...@esgyn.com>
Subject RE: How spark writes to HBASE
Date Mon, 22 Jan 2018 16:53:49 GMT
There are some engines that will do this. Apache Trafodion for example will hash partition
results to be inserted into a table in HBase so that the puts are done locally.

-----Original Message-----
From: Ted Yu [mailto:yuzhihong@gmail.com] 
Sent: Monday, January 22, 2018 8:47 AM
To: user@hbase.apache.org
Subject: Re: How spark writes to HBASE

Which connector do you use to perform the write ?

bq. Or spark will wisely launch an executor on that machine

I don't think that is the case. Multiple writes may be performed which would end up on different
region servers. Spark won't provide the affinity described above.

On Mon, Jan 22, 2018 at 7:19 AM, vignesh <vignesh093@gmail.com> wrote:

> Hi,
> I have a Spark job which reads some timeseries data and pushes that to 
> HBASE using HBASE client API. I am executing this Spark job on a 10 
> node cluster. Say at first when spark kicks off it picks
> machine1,machine2,machine3 as its executors. Now when the job inserts 
> a row to HBASE. Below is what my undersatnding on what it does.
> Based on the row key a particular region(from the META) would be 
> chosen and that row will be pushed to that RegionServer's memstore and 
> WAL and once the memestore is full it would be flushed to the disk.Now 
> if assume a particular row is being processed by a executor on
> machine2 and the regionserver which handles that region to which the 
> put is to be made is on machine6. Will the data be transferred from
> machine2 to machine6 over network and then the data will be stored in 
> memstore of machine6. Or spark will wisely launch an executor on that 
> machine during write(if the dynamic allocation is turned on) and 
> pushes to it?
> --
View raw message