hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Atul Rajan <atul.raja...@icloud.com>
Subject DR for Data Lake
Date Thu, 27 Jul 2017 04:11:01 GMT
Hello all,

We are planning to implement Data lake for our financial data. How can we achieve Disaster
Recovery for our Data Lake.

initially all the data marts will be pushed to data lake but we want something for our Data
recovery. please suggest some ideas

Thanks and Regards
Atul Rajan


-Sent from my iPhone

On 12-Jan-2017, at 4:43 AM, Akash Mishra <akash.mishra20@gmail.com> wrote:

You are getting NPE on org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.a.getName
 which is not in Hadoop codebase. I can see you are using some other Scheduler Implementation
com.pepperdata.supervisor.scheduler.PepperdataSupervisorYarnFair, hence you can check 
SourceFile:204 for more details. 

My guess is that you need to set some Name parameter in which is requested only on Debug level.

Thanks, 



> On Wed, Jan 11, 2017 at 10:59 PM, Stephen Sprague <spragues@gmail.com> wrote:
> ok.  i would attach but... i think there might be an aversion to attachments so i'll
paste inline.  hopefully its not too confusing.
> 
> $ cat fair-scheduler.xml
> 
> <?xml version="1.0"?>
> 
> <!--
>   This is a sample configuration file for the Fair Scheduler. For details
>   on the options, please refer to the fair scheduler documentation at
>   http://hadoop.apache.org/core/docs/r0.21.0/fair_scheduler.html.
> 
>   To create your own configuration, copy this file to conf/fair-scheduler.xml
>   and add the following property in mapred-site.xml to point Hadoop to the
>   file, replacing [HADOOP_HOME] with the path to your installation directory:
>     <property>
>       <name>mapred.fairscheduler.allocation.file</name>
>       <value>[HADOOP_HOME]/conf/fair-scheduler.xml</value>
>     </property>
> 
>   Note that all the parameters in the configuration file below are optional,
>   including the parameters inside <pool> and <user> elements. It is only
>   necessary to set the ones you want to differ from the defaults.
> -->
> 
> <!-- https://hadoop.apache.org/docs/r1.2.1/fair_scheduler.html -->
> 
> <allocations>
> 
>   <!-- NOTE. ** Preemption IS NOT turn on! ** -->
> 
>   <!-- Preemption timeout for jobs below their fair share, in seconds.
>     If a job is below half its fair share for this amount of time, it
>     is allowed to kill tasks from other jobs to go up to its fair share.
>     Requires mapred.fairscheduler.preemption to be true in mapred-site.xml. -->
>   <fairSharePreemptionTimeout>600</fairSharePreemptionTimeout>
> 
>   <!-- Default min share preemption timeout for pools where it is not
>     explicitly configured, in seconds. Requires mapred.fairscheduler.preemption
>     to be set to true in your mapred-site.xml. -->
>   <defaultMinSharePreemptionTimeout>600</defaultMinSharePreemptionTimeout>
> 
>   <!-- Default running job limit pools where it is not explicitly set. -->
>   <queueMaxJobsDefault>20</queueMaxJobsDefault>
> 
>   <!-- Default running job limit users where it is not explicitly set. -->
>   <userMaxJobsDefault>10</userMaxJobsDefault>
> 
> 
> <!--  QUEUES:
>          dwr.interactive   : 10 at once
>          dwr.batch_sql     : 15 at once
>          dwr.batch_hdfs    : 5 at once   (distcp, sqoop, hfs -put, anything besides 'sql')
>          dwr.qa            : 3 at once
>          dwr.truck_lane    : 1 at once
> 
>          cad.interactive   : 5 at once
>          cad.batch         : 10 at once
> 
>          comms.interactive : 5 at once
>          comms.batch       : 3 at once
> 
>          default           : 2 at once   (to discourage its use)
> -->
> 
> 
> <!-- queue placement -->
> 
>   <queuePlacementPolicy>
>     <rule name="specified" />
>     <rule name="default" />
>   </queuePlacementPolicy>
> 
> 
> <!-- footprint -->
>  <queue name='footprint'>
>     <schedulingPolicy>fair</schedulingPolicy>   <!-- can be fifo too -->
> 
>     <maxRunningApps>4</maxRunningApps>
>     <aclSubmitApps>*</aclSubmitApps>
> 
>     <minMaps>10</minMaps>
>     <minReduces>5</minReduces>
>     <userMaxJobsDefault>50</userMaxJobsDefault>
> 
>     <maxMaps>200</maxMaps>
>     <maxReduces>200</maxReduces>
>     <minResources>20000 mb, 10 vcores</minResources>
>     <maxResources>500000 mb, 175 vcores</maxResources>
> 
>     <queue name="dev">
>        <maxMaps>200</maxMaps>
>        <maxReduces>200</maxReduces>
>        <minResources>20000 mb, 10 vcores</minResources>
>        <maxResources>500000 mb, 175 vcores</maxResources>
>     </queue>
> 
>     <queue name="stage">
>        <maxMaps>200</maxMaps>
>        <maxReduces>200</maxReduces>
>        <minResources>20000 mb, 10 vcores</minResources>
>        <maxResources>500000 mb, 175 vcores</maxResources>
>     </queue>
>   </queue>
> 
> <!-- comms -->
>  <queue name='comms'>
>     <schedulingPolicy>fair</schedulingPolicy>   <!-- can be fifo too -->
> 
>     <queue name="interactive">
>        <maxRunningApps>5</maxRunningApps>
>        <aclSubmitApps>*</aclSubmitApps>
>     </queue>
> 
>     <queue name="batch">
>        <maxRunningApps>10</maxRunningApps>
>        <aclSubmitApps>*</aclSubmitApps>
>     </queue>
> 
>   </queue>
> 
> <!-- cad -->
>  <queue name='cad'>
>     <schedulingPolicy>fair</schedulingPolicy>   <!-- can be fifo too -->
> 
>     <queue name="interactive">
>        <maxRunningApps>5</maxRunningApps>
>        <aclSubmitApps>*</aclSubmitApps>
>     </queue>
> 
> 
>     <queue name="batch">
>        <maxRunningApps>10</maxRunningApps>
>        <aclSubmitApps>*</aclSubmitApps>
>     </queue>
> 
>   </queue>
> 
> 
> 
> <!-- dwr -->
>   <queue name="dwr">
> 
>     <schedulingPolicy>fair</schedulingPolicy>   <!-- can be fifo too -->
>     <minMaps>10</minMaps>
>     <minReduces>5</minReduces>
>     <userMaxJobsDefault>50</userMaxJobsDefault>
> 
>     <maxMaps>200</maxMaps>
>     <maxReduces>200</maxReduces>
>     <minResources>20000 mb, 10 vcores</minResources>
>     <maxResources>500000 mb, 175 vcores</maxResources>
> 
> <!-- INTERACTiVE. 5 at once -->
>     <queue name="interactive">
>         <weight>2.0</weight>
>         <maxRunningApps>5</maxRunningApps>
> 
>        <maxMaps>200</maxMaps>
>        <maxReduces>200</maxReduces>
>        <minResources>20000 mb, 10 vcores</minResources>
>        <maxResources>500000 mb, 175 vcores</maxResources>
> 
> <!-- not used. Number of seconds after which the pool can preempt other pools -->
>         <minSharePreemptionTimeout>60</minSharePreemptionTimeout>
> 
> <!-- per user. but given everything is dwr (for now) its not helpful -->
>         <userMaxAppsDefault>5</userMaxAppsDefault>
>         <aclSubmitApps>*</aclSubmitApps>
>     </queue>
> 
> 
> <!-- BATCH. 15 at once -->
>     <queue name="batch_sql">
>         <weight>1.5</weight>
>         <maxRunningApps>15</maxRunningApps>
> 
>        <maxMaps>200</maxMaps>
>        <maxReduces>200</maxReduces>
>        <minResources>20000 mb, 10 vcores</minResources>
>        <maxResources>500000 mb, 175 vcores</maxResources>
> 
> <!-- not used. Number of seconds after which the pool can preempt other pools -->
>         <minSharePreemptionTimeout>300</minSharePreemptionTimeout>
> 
>         <userMaxAppsDefault>50</userMaxAppsDefault>
>         <aclSubmitApps>*</aclSubmitApps>
>     </queue>
> 
> 
> <!-- sqoop, distcp, hdfs-put type jobs here. 3 at once -->
>     <queue name="batch_hdfs">
>         <weight>1.0</weight>
>         <maxRunningApps>3</maxRunningApps>
> 
> <!-- not used. Number of seconds after which the pool can preempt other pools -->
>         <minSharePreemptionTimeout>300</minSharePreemptionTimeout>
>         <userMaxAppsDefault>50</userMaxAppsDefault>
>         <aclSubmitApps>*</aclSubmitApps>
>     </queue>
> 
> 
> <!-- QA. 3 at once -->
>     <queue name="qa">
>         <weight>1.0</weight>
>         <maxRunningApps>100</maxRunningApps>
> 
> <!-- not used. Number of seconds after which the pool can preempt other pools -->
>         <minSharePreemptionTimeout>300</minSharePreemptionTimeout>
>         <aclSubmitApps>*</aclSubmitApps>
>         <userMaxAppsDefault>50</userMaxAppsDefault>
> 
>     </queue>
> 
> <!-- big, unruly jobs -->
>     <queue name="truck_lane">
>         <weight>0.75</weight>
>         <maxRunningApps>1</maxRunningApps>
>         <minMaps>5</minMaps>
>         <minReduces>5</minReduces>
> 
> <!-- lets try without static values and see how the "weight" works
> -->
>         <maxMaps>192</maxMaps>
>         <maxReduces>192</maxReduces>
>         <minResources>20000 mb, 10 vcores</minResources>
>         <maxResources>500000 mb, 200 vcores</maxResources>
> 
> <!-- not used. Number of seconds after which the pool can preempt other pools -->
> <!--
>         <minSharePreemptionTimeout>300</minSharePreemptionTimeout>
>         <aclSubmitApps>*</aclSubmitApps>
>         <userMaxAppsDefault>50</userMaxAppsDefault>
> -->
>     </queue>
>   </queue>
> 
> <!-- DEFAULT. 2 at once -->
>   <queue name="default">
>        <maxRunningApps>2</maxRunningApps>
> 
>        <maxMaps>40</maxMaps>
>        <maxReduces>40</maxReduces>
>        <minResources>20000 mb, 10 vcores</minResources>
>        <maxResources>20000 mb, 10 vcores</maxResources>
> 
> <!-- not used. Number of seconds after which the pool can preempt other pools -->
>       <minSharePreemptionTimeout>60</minSharePreemptionTimeout>
>       <userMaxAppsDefault>5</userMaxAppsDefault>
>       <aclSubmitApps>*</aclSubmitApps>
>   </queue>
> 
> 
> </allocations>
> 
> 
> 
> <!-- some other stuff
> 
>     <minResources>10000 mb,0vcores</minResources>
>     <maxResources>90000 mb,0vcores</maxResources>
> 
>     <minMaps>10</minMaps>
>     <minReduces>5</minReduces>
> 
> -->
> 
> <!-- enabling
>    * Bringing the queues in effect:
>    Once the required parameters are defined in fair-scheduler.xml file, run the command
to bring the changes in effect.
>    yarn rmadmin -refreshQueues
> -->
> 
> <!-- verifying
>   Once the command runs properly, verify if the queues are setup using 2 options:
> 
>   1) hadoop queue -list
>   or
>   2) Open YARN resourcemanager GUI from Resource Manager GUI: http://<Resouremanager-hostname>:8088,
click Scheduler.
> 
> -->
> 
> 
> <!-- notes
>    [fail_user@phd11-nn ~]$ id
>    uid=507(fail_user) gid=507(failgroup) groups=507(failgroup)
>    [fail_user@phd11-nn ~]$ hadoop queue -showacls
> -->
> 
> 
> <!-- submit
>    To submit an application use the parameter -Dmapred.job.queue.name=<queue-name>
or -Dmapred.job.queuename=<queue-name>
> -->
> 
> 
> 
> 
> 
> *** yarn-site.xml
> 
> 
> 
> $ cat yarn-site.xml
> 
> ssprague-mbpro:~ spragues$ cat yarn-site.xml
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> 
> <configuration>
> <!--Autogenerated yarn params from puppet yaml hash yarn_site_parameters__xml -->
>   <property>
>     <name>yarn.resourcemanager.hostname</name>
>     <value>FOO.sv2.trulia.com</value>
>   </property>
>   <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>mapreduce_shuffle</value>
>   </property>
>   <property>
>     <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
>     <value>org.apache.hadoop.mapred.ShuffleHandler</value>
>   </property>
>   <property>
>     <name>yarn.nodemanager.local-dirs</name>
>     <value>/storage0/hadoop/yarn/local,/storage1/hadoop/yarn/local,/storage2/hadoop/yarn/local,/storage3/hadoop/yarn/local,/storage4/hadoop/yarn/local,/storage5/hadoop/yarn/local</value>
>   </property>
>   <property>
>     <name>yarn.resourcemanager.scheduler.class</name>
>     <value>com.pepperdata.supervisor.scheduler.PepperdataSupervisorYarnFair</value>
>   </property>
>   <property>
>     <name>yarn.application.classpath</name>
>     <value>$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*,$TEZ_HOME/*,$TEZ_HOME/lib/*</value>
>   </property>
>   <property>
>     <name>pepperdata.license.key.specification</name>
>     <value>data://removed</value>
>   </property>
>   <property>
>     <name>pepperdata.license.key.comments</name>
>     <value>License Type: PRODUCTION Expiration Date (UTC): 2017/02/01 Company Name:
Trulia, LLC Cluster Name: trulia-production Number of Nodes: 150 Contact Person Name: Deep
Varma Contact Person Email: dvarma@trulia.com</value>
>   </property>
>   <property>
>     <name>yarn.timeline-service.hostname</name>
>     <value>FOO.sv2.trulia.com</value>
>   </property>
>   <property>
>     <name>yarn.timeline-service.enabled</name>
>     <value>true</value>
>   </property>
>   <property>
>     <name>yarn.timeline-service.webapp.address</name>
>     <value>FOO.sv2.trulia.com:8188</value>
>   </property>
>   <property>
>     <name>yarn.timeline-service.http-cross-origin.enabled</name>
>     <value>true</value>
>   </property>
>   <property>
>     <name>yarn.timeline-service.ttl-enable</name>
>     <value>false</value>
>   </property>
> 
> <!--
>   <property>
>     <name>yarn.timeline-service.store-class</name>
>     <value>org.apache.hadoop.yarn.server.timeline.RollingLevelDbTimelineStore</value>
>   </property>
> -->
>   <property>
>     <name>yarn.resourcemanager.system-metrics-publisher.enabled</name>
>     <value>true</value>
>   </property>
>   <property>
>     <name>yarn.scheduler.fair.user-as-default-queue</name>
>     <value>true</value>
>   </property>
>   <property>
>     <name>yarn.scheduler.fair.preemption</name>
>     <value>false</value>
>   </property>
>   <property>
>     <name>yarn.scheduler.fair.sizebasedweight</name>
>     <value>true</value>
>   </property>
>   <property>
>     <name>yarn.scheduler.minimum-allocation-mb</name>
>     <value>2048</value>
>   </property>
>   <property>
>     <name>yarn.scheduler.maximum-allocation-mb</name>
>     <value>8192</value>
>   </property>
>   <property>
>     <name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage</name>
>     <value>98.5</value>
>   </property>
>   <property>
>     <name>yarn.log-aggregation.retain-seconds</name>
>     <value>604800</value>
>   </property>
>   <property>
>     <name>yarn.log-aggregation-enable</name>
>     <value>true</value>
>   </property>
>   <property>
>     <name>yarn.nodemanager.log-dirs</name>
>     <value>${yarn.log.dir}/userlogs</value>
>   </property>
>   <property>
>     <name>yarn.nodemanager.remote-app-log-dir</name>
>     <value>/app-logs</value>
>   </property>
>   <property>
>     <name>yarn.nodemanager.delete.debug-delay-sec</name>
>     <value>600</value>
>   </property>
>   <property>
>     <name>yarn.log.server.url</name>
>     <value>http://FOO.sv2.trulia.com:19888/jobhistory/logs</value>
>   </property>
> 
> </configuration>
> 
> 
>> On Wed, Jan 11, 2017 at 2:27 PM, Akash Mishra <akash.mishra20@gmail.com> wrote:
>> Please post your fair-scheduler.xml file and yarn-site.xml 
>> 
>>> On Wed, Jan 11, 2017 at 9:14 PM, Stephen Sprague <spragues@gmail.com> wrote:
>>> hey guys,
>>> i'm running the RM with the above options (version 2.6.1) and get an NPE upon
startup.
>>> 
>>> {code}
>>> 17/01/11 12:44:45 FATAL resourcemanager.ResourceManager: Error starting ResourceManager
>>> java.lang.NullPointerException
>>>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.a.getName(SourceFile:204)
>>>         at org.apache.hadoop.service.CompositeService.addService(CompositeService.java:73)
>>>         at org.apache.hadoop.service.CompositeService.addIfService(CompositeService.java:88)
>>>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:490)
>>>         at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>>>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:993)
>>>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:255)
>>>         at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>>>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1214)
>>> 17/01/11 12:44:45 INFO resourcemanager.ResourceManager: SHUTDOWN_MSG:
>>> {code}
>>> 
>>> the fair-scheduler.xml file is fine and works in INFO level logging so i'm pretty
sure there's nothing "wrong" with it. So with DEBUG level its making this java call and barfing.
>>> 
>>> Any ideas how to fix this?
>>> 
>>> thanks,
>>> Stephen.
>> 
>> 
>> 
>> -- 
>> Regards,
>> Akash Mishra.
>> 
>> "It's not our abilities that make us, but our decisions."--Albus Dumbledore
> 



-- 
Regards,
Akash Mishra.

"It's not our abilities that make us, but our decisions."--Albus Dumbledore
Mime
View raw message