Mailing-List: contact issues-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hive.apache.org
Date: Wed, 14 Jun 2017 10:00:00 +0000 (UTC)
From: "anishek (JIRA)" <jira@apache.org>
To: issues@hive.apache.org
Message-ID: <JIRA.13078512.1496987512000.15871.1497434400739@Atlassian.JIRA>
In-Reply-To: <JIRA.13078512.1496987512000@Atlassian.JIRA>
References: <JIRA.13078512.1496987512000@Atlassian.JIRA> <JIRA.13078512.1496987512461@jira-lw-us.apache.org>
Subject: [jira] [Comment Edited] (HIVE-16865) Handle replication bootstrap
 of large databases
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
archived-at: Wed, 14 Jun 2017 10:00:06 -0000


    [ https://issues.apache.org/jira/browse/HIVE-16865?page=3Dcom.atlassian=
.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D1604=
8991#comment-16048991 ]=20

anishek edited comment on HIVE-16865 at 6/14/17 9:59 AM:
---------------------------------------------------------

h4. Bootstrap Replication Dump
* db metadata one at a time
* function metadata one at a time
* get all tableNames =E2=80=94 which should not be overwhelming =09
* tables is also requested one a time. =E2=80=94 all partition definitions =
are loaded in one go for a table and we dont expect a table with more than =
5-10 partition definition columns
* the partition object  themselves will be done in batches via the Partitio=
nIteratable
* only problem seems to be when writing data to _files where in we load all=
 the file status objects per partition ( for partitioned tables) and per ta=
ble otherwise , in memory. this might lead  OOM cases :: decision : this is=
 not a problem as for split computation we will do the same, where we have =
not faced this issue.
* we create replCopyTask that will create the _files for all tables / parti=
tions etc during analysis time and then go to execution engine, this will  =
lead to lot of objects stored in memory given the above scale targets. *pos=
sibly*=20
** move the dump enclosed in a task itself which manage its own thread pool=
s to subsequently analyze/dump tables in execution phase, this will lead to=
 possible blurring of demarcation of execution vs analysis phase within hiv=
e.=20
** Another mode might be to provide lazy incremental task, from analysis to=
 execution phase, such that both phases run simultaneously rather than one =
completing before another is started, this will lead to significant change =
in code to allow the same and currently only seems to be required only for =
replication.
** we might have to do the same for _incremental replication dump_ too as t=
he _*from*_ and _*to*_ event ids might have millions of events will all of =
them being inserts, though the creation of _files is handled differently he=
re where in we write the files along with metadata, we should be able to do=
 the same for bootstrap replication also rather than creating replcopy task=
. this would mean the  replCopyTask should effectively be only used during =
load time. The only problem using this approach is that since the process i=
s single threaded we are going to dump data sequentially and it might take =
long time, unless we do some threading in ReplicationSemanticAnalyzer to du=
mp tables with some parallel since there is no dependency between tables wh=
en dumping them, a similar approach might be required for partitions also w=
ithin tables.=20

h4.Bootstrap Replication Load
* list all the table metadata files per db. For massive databases we will l=
oad a per above on the order of a million filestatus objects in memory. Thi=
s seems to significant higher order of objects loaded than probably during =
split computation and hence might need to look at it.  most probably move t=
o {code}org.apache.hadoop.fs.RemoteIterator<LocatedFileStatus>=09listFiles(=
Path f, boolean recursive){code}
* a task will be created for each type of operation, in case of bootstrap o=
ne task per table / partition / function /database, hence we will encounter=
 the last problem in _*Bootstrap Replication Dump*_=20


h4. Additional thoughts
* Since there can be multiple instance of metastores, from an integration w=
.r.t beacon for replication, would it be better to have a dedicated metasto=
re instance for replication related workload(at least for bootstrap), since=
 the execution of tasks will take place on the metastore instance it might =
be better served for the customer to have one metastore for replication and=
 others to handle normal workloads. This can be achieved, I think, based on=
 how the URL's are configured on HS2 client/orchestration engine of replica=
tion .=20
* On calling distcp in replcopytask can we log the sourcepath to destpath e=
lse if there are problems during copying we wont know the actual paths.
* On replica warehouse since replication tasks will run alongside normal ex=
ecution of other hive tasks assuming there are multiple db's on replica, ho=
w do we constraint resource allocation for replication vs normal task ? how=
 do we manage this such that we dont lag behind replication significantly ?=
=20


was (Author: anishek):
h4. Bootstrap Replication Dump
* db metadata one at a time
* function metadata one at a time
* get all tableNames =E2=80=94 which should not be overwhelming =09
* tables is also requested one a time. =E2=80=94 all partition definitions =
are loaded in one go for a table and we dont expect a table with more than =
5-10 partition definition columns
* the partition object  themselves will be done in batches via the Partitio=
nIteratable
* only problem seems to be when writing data to _files where in we load all=
 the file status objects per partition ( for partitioned tables) and per ta=
ble otherwise , in memory. this might lead  OOM cases :: decision : this is=
 not a problem as for split computation we will do the same, where we have =
not faced this issue.
* we create replCopyTask that will create the _files for all tables / parti=
tions etc during analysis time and then go to execution engine, this will  =
lead to lot of objects stored in memory given the above scale targets. *pos=
sibly*=20
** move the dump enclosed in a task itself which manage its own thread pool=
s to subsequently analyze/dump tables in execution phase, this will lead to=
 possible blurring of demarcation of execution vs analysis phase within hiv=
e.=20
** Another mode might be to provide lazy incremental task, from analysis to=
 execution phase, such that both phases run simultaneously rather than one =
completing before another is started, this will lead to significant change =
in code to allow the same and currently only seems to be required only for =
replication.
** we might have to do the same for _incremental replication dump_ too as t=
he _*from*_ and _*to*_ event ids might have millions of events will all of =
them being inserts, though the creation of _files is handled differently he=
re where in we write the files along with metadata, we should be able to do=
 the same for bootstrap replication also rather than creating replcopy task=
. this would mean the  replCopyTask should effectively be only used during =
load time. The only problem using this approach is that since the process i=
s single threaded we are going to dump data sequentially and it might take =
long time, unless we do some threading in ReplicationSemanticAnalyzer to du=
mp tables with some parallel since there is no dependency between tables wh=
en dumping them, a similar approach might be required for partitions also w=
ithin tables.=20

h4.Bootstrap Replication Load
* list all the table metadata files per db. For massive databases we will l=
oad a per above on the order of a million filestatus objects in memory. Thi=
s seems to significant higher order of objects loaded than probably during =
split computation and hence might need to look at it.  most probably move t=
o {code}org.apache.hadoop.fs.RemoteIterator<LocatedFileStatus>=09listFiles(=
Path f, boolean recursive){code}
* a task will be created for each type of operation, in case of bootstrap o=
ne task per table / partition / function /database, hence we will encounter=
 the last problem in _*Bootstrap Replication Dump*_=20


h4. Additional thoughts
* Since there can be multiple instance of metastores, from an integration w=
.r.t beacon for replication, would it be better to have a dedicated metasto=
re instance for replication related workload(at least for bootstrap), since=
 the execution of tasks will take place on the metastore instance it might =
be better served for the customer to have one metastore for replication and=
 others to handle normal workloads. This can be achieved, I think, based on=
 how the URL's are configured on HS2/beacon side.=20
* On calling distcp in replcopytask can we log the sourcepath to destpath e=
lse if there are problems during copying we wont know the actual paths.
* On replica warehouse since replication tasks will run alongside normal ex=
ecution of other hive tasks assuming there are multiple db's on replica, ho=
w do we constraint resource allocation for replication vs normal task ? how=
 do we manage this such that we dont lag behind replication significantly ?=
=20


> Handle replication bootstrap of large databases
> -----------------------------------------------
>
>                 Key: HIVE-16865
>                 URL: https://issues.apache.org/jira/browse/HIVE-16865
>             Project: Hive
>          Issue Type: Improvement
>          Components: HiveServer2
>    Affects Versions: 3.0.0
>            Reporter: anishek
>            Assignee: anishek
>             Fix For: 3.0.0
>
>
> for larger databases make sure that we can handle replication bootstrap.
> * Assuming large database can have close to million tables or a few table=
s with few hundred thousand partitions.=20
> *  for function replication if a primary warehouse has large number of cu=
stom functions defined such that the same binary file in corporates most of=
 these functions then on the replica warehouse there might be a problem in =
loading all these functions as we will have the same jar on primary copied =
over for each function such that each function will have a local copy of th=
e jar, loading all these jars might lead to excessive memory usage.=20
> =20


--
This message was sent by Atlassian JIRA
(v6.4.14#64029)