cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Darren Shepherd <darren.s.sheph...@gmail.com>
Subject Re: Latest Master DB issue
Date Wed, 09 Oct 2013 08:59:03 GMT
Kelven,

So the issue is the combination of my code with VmwareContextPool.
With the ManagedContext framework, what I've done is replace every
Runnable and TimerTask with ManagedContextRunnable and
ManagedContextTimerTask.  Those classes will run the onEnter, onLeave
logic which will setup CallContext as a result.  I purposely changed
every reference so that everything was consistent.  I didn't want to
have developers have to consider when or when not to use
ManagedContext, so just always use it.  So as a result even though
your code has nothing to do with the DB, the ManageContextListener for
CallContext does.

So I'm sure your thinking that Resources shouldn't call the DB.  The
ManagedContext framework only does things when deployed in a managed
JVM.  The only managed JVM is the mgmt server.  If it's AWSAPI, Usage,
or an Agent JVM, then that framework does nothing.

So there are three possible solutions I see

1) Change VmwareContextPool to be initialized from the @PostConstruct
or start() method.
2) Revert the change to VmwareContextPool to use TimerTask and not
ManageContextTimerTask
3) Merge spring modularization.

The simplest stop gap would be option 2.

Darren

On Tue, Oct 8, 2013 at 5:35 PM, Kelven Yang <kelven.yang@citrix.com> wrote:
> The problem seems to me is whether or not a background job that touches
> with database respects the bootstrap initialization order. As of
> VmwareContextPool itself, its background job does something fully within
> its own territory (no database, no reference outside). and vmware-base
> package was originally designed to be running on its own without assuming
> any container that offers unified lifecycle management. I don't think this
> type of background job has anything to do with the failure in this
> particular case.
>
> However, I do agree that we need to clean up and unify a few things inside
> the CloudStack, especially on life-cycle management and all
> background-jobs that their execution path touches with component
> life-cyle, auto-wiring, AOP etc.
>
> To live with the time before the spring modularization merge, we just need
> to figure out which background job that triggers all these and get it
> fixed, it used to work before even it is fragile, I don't think the fix of
> the problem is impossible. Is anyone working on this issue?
>
> Kelven
>
>
>
>
> On 10/8/13 2:35 PM, "Darren Shepherd" <darren.s.shepherd@gmail.com> wrote:
>
>>Some more info about this.  What specifically is happening is that the
>>VmwareContextPool call is creating a Timer during the constructor of
>>the class which is being constructed in a static block from
>>VmwareContextFactory.  So when the VmwareContextFactory class is
>>loaded by the class loader, the background thread is created.  Which
>>is way, way before the Database upgrade happens.  This will still be
>>fixed if we merge the spring modularization, but this vmware code
>>should change regardless.  Background threads should only be launched
>>from a @PostConstruct or ComponentLifecycle.start() method.  They
>>should not be started when a class is constructed or loaded.
>>
>>Darren
>>
>>
>>On Tue, Oct 8, 2013 at 2:22 PM, Darren Shepherd
>><darren.s.shepherd@gmail.com> wrote:
>>> Hey, I half way introduced this issue in a really long and round about
>>> way.  I don't think there's a good simple fix unless we merge the
>>> spring-modularization branch.  I'm going to look further into it.  But
>>> here's the background of why we are seeing this.
>>>
>>> I introduced "Managed Context" framework that will wrap all the
>>> background threads and manage the thread locals.  This was the union
>>> of CallContext, ServerContext, and AsyncJob*Context into one simple
>>> framework.  The problem with ACS though is that A LOT of background
>>> threads are spawned at all different random times of the
>>> initialization.  So what is happening is that during the
>>> initialization of some bean its kicking off a background thread that
>>> tries to access the database before the database upgrade has ran.  Now
>>> the CallContext has a strange suicidal behaviour (this was already
>>> there, I didn't change this), if it can't find account 1, it does a
>>> System.exit(1).  So since this one background thread is failing, the
>>> whole JVM shuts down.  Before CallContext only existed on some
>>> threads, but the addition of the Managed Context framework, it is now
>>> on almost all threads.
>>>
>>> Now in the spring-modularization branch there is a very strict and
>>> (mostly) deterministic initialization order.  The database upgrade
>>> class will be initialized and ran before any other bean in CloudStack
>>> is even initiated.  So this works around all these DB problems.  The
>>> current spring setup in master is very, very fragile.  As I said
>>> before, it is really difficult to ensure certain aspects are
>>> initialized before others, and since we moved to (which I don't really
>>> agree with) doing DB schema upgrades purely on startup of the mgmt
>>> server, we now have to be extra careful about initialization order.
>>>
>>> Darren
>>>
>>> On Tue, Oct 8, 2013 at 11:34 AM, Rayees Namathponnan
>>> <rayees.namathponnan@citrix.com> wrote:
>>>> Here the defect created for this issue
>>>>
>>>> https://issues.apache.org/jira/browse/CLOUDSTACK-4825
>>>>
>>>>
>>>> Regards,
>>>> Rayees
>>>>
>>>> -----Original Message-----
>>>> From: Francois Gaudreault [mailto:fgaudreault@cloudops.com]
>>>> Sent: Tuesday, October 08, 2013 11:15 AM
>>>> To: dev@cloudstack.apache.org
>>>> Subject: Re: Latest Master DB issue
>>>>
>>>> I guess in my case, it's fine. It was a fresh install...
>>>>
>>>> Francois
>>>>
>>>> On 10/8/2013, 2:08 PM, Darren Shepherd wrote:
>>>>> Deploy db from maven will drop all the tables.  Not sure if this is
>>>>> fresh install or not.  For master, running mvn will be your best bet.
>>>>> Otherwise you can look at running com.cloud.upgrade.DatabaseCreator
>>>>> manually if your adventurous.
>>>>>
>>>>> Darren
>>>>>
>>>>> On Tue, Oct 8, 2013 at 10:55 AM, Francois Gaudreault
>>>>> <fgaudreault@cloudops.com> wrote:
>>>>>> Thanks Alena for the explaination.
>>>>>>
>>>>>> What is the better path to fix this on our setup? Should I wait for
a
>>>>>> fix in master or should I manually run the deploydb with mvn? I guess
>>>>>> the second option won't work since I used RPM?
>>>>>>
>>>>>> Francois
>>>>>>
>>>>>>
>>>>>> On 10/8/2013, 1:47 PM, Alena Prokharchyk wrote:
>>>>>>> Ok, this is what going on - the DB upgrade procedure is different
on
>>>>>>> developer's setup and when deployed using cloudstack-setup-databases
>>>>>>>
>>>>>>> On developers setup:
>>>>>>>
>>>>>>> 1) you deploy the code
>>>>>>> 2) Deploy the DB using 'mvn -P developer -pl developer -Ddeploydb'.
>>>>>>> As a part of this step, the DataBaseUpgradeChecker:
>>>>>>>
>>>>>>> * first deploys the base DB version - 4.0.0
>>>>>>> * then checks the current version of the code, and performs the
db
>>>>>>> upgrade if needed. So on master, version table looks like this
after
>>>>>>> the db is
>>>>>>> deployed:
>>>>>>>
>>>>>>> mysql> select * from version;
>>>>>>> +----+---------+---------------------+----------+
>>>>>>> | id | version | updated             | step     |
>>>>>>> +----+---------+---------------------+----------+
>>>>>>> |  1 | 4.0.0   | 2013-10-08 10:34:47 | Complete |
>>>>>>> |  2 | 4.1.0   | 2013-10-08 17:35:22 | Complete |
>>>>>>> |  3 | 4.2.0   | 2013-10-08 17:35:22 | Complete |
>>>>>>> |  4 | 4.3.0   | 2013-10-08 17:35:22 | Complete |
>>>>>>> +----+---------+---------------------+----------+
>>>>>>> 4 rows in set (0.00 sec)
>>>>>>>
>>>>>>>
>>>>>>> 3) Start management server.
>>>>>>>
>>>>>>>
>>>>>>> When deployed from rpm:
>>>>>>>
>>>>>>> 1) you deploy the code
>>>>>>> 2) run cloudstack-setup-databases. As the result of this step,
4.0.0
>>>>>>> base version of the DB is deployed. Thats why you see only 4.0.0
>>>>>>> record in the DB.
>>>>>>> 3) Start management server. DataBaseUpgradeChecker is being invoked
>>>>>>> as a part of it, and performs the db upgrade to the version of
the
>>>>>>> current code. Only after that all the managers get invoked +
system
>>>>>>> caller context get initialized.
>>>>>>>
>>>>>>> Looks like the load order for step 3) got broken recently, and
>>>>>>> system context gets initialized before the db upgrade is finished.
>>>>>>> So we either need to fix the order, or invoke DataBaseUpgradeChecker
>>>>>>> as a part of cloudstack-setup-databases so at the point when
>>>>>>> management server starts up, the DB already upgraded to the latest
>>>>>>>version.
>>>>>>>
>>>>>>> -Alena.
>>>>>>>
>>>>>>>
>>>>>>> On 10/8/13 10:25 AM, "Francois Gaudreault"
>>>>>>> <fgaudreault@cloudops.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hmm... I just checked the DB version and it's 4.0??? It should
be
>>>>>>>> 4.3.0 no?
>>>>>>>>
>>>>>>>> mysql> select * from version;
>>>>>>>> +----+---------+---------------------+----------+
>>>>>>>> | id | version | updated             | step     |
>>>>>>>> +----+---------+---------------------+----------+
>>>>>>>> |  1 | 4.0.0   | 2013-10-08 10:58:49 | Complete |
>>>>>>>> +----+---------+---------------------+----------+
>>>>>>>> 1 row in set (0.00 sec)
>>>>>>>>
>>>>>>>> I installed cloudstack-management-4.3.0:
>>>>>>>> [root@eng-testing-cstack_master ~]# rpm -qf
>>>>>>>> /usr/bin/cloudstack-setup-databases
>>>>>>>> cloudstack-management-4.3.0-SNAPSHOT.el6.x86_64
>>>>>>>>
>>>>>>>> Francois
>>>>>>>>
>>>>>>>> On 10/8/2013, 1:04 PM, Francois Gaudreault wrote:
>>>>>>>>> It's a fresh master RPM install.
>>>>>>>>>
>>>>>>>>> Francois
>>>>>>>>>
>>>>>>>>> On 10/8/2013, 12:40 PM, Alena Prokharchyk wrote:
>>>>>>>>>> It is not a small issue. is_default filed was added
to the table
>>>>>>>>>> as a part of the 41-42 db upgrade. Looks like the
code tries to
>>>>>>>>>> retrieve system user before the db upgrade is completed.
>>>>>>>>>>
>>>>>>>>>> DB upgrade is a major part of system integrity check;
no queries
>>>>>>>>>> to the DB should be made before its completed. Francois,
did you
>>>>>>>>>> start seeing this problem just recently?
>>>>>>>>>>
>>>>>>>>>> -Alena.
>>>>>>>>>>
>>>>>>>>>> On 10/8/13 8:04 AM, "Francois Gaudreault"
>>>>>>>>>> <fgaudreault@cloudops.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I compiled Master this morning, and there is
a small DB issue.
>>>>>>>>>>> One field is missing in the account table (field
default). CS
>>>>>>>>>>> will not start because of that.
>>>>>>>>>>>
>>>>>>>>>>> 2013-10-08 11:01:42,623 FATAL [o.a.c.c.CallContext]
>>>>>>>>>>> (Timer-2:null) Exiting the system because we're
unable to
>>>>>>>>>>> register the system call context.
>>>>>>>>>>> com.cloud.utils.exception.CloudRuntimeException:
DB Exception
>>>>>>>>>>>on:
>>>>>>>>>>> com.mysql.jdbc.JDBC4PreparedStatement@4c1aa2e9:
SELECT
>>>>>>>>>>> account.id, account.account_name, account.type,
>>>>>>>>>>> account.domain_id, account.state, account.removed,
>>>>>>>>>>> account.cleanup_needed, account.network_domain,
account.uuid,
>>>>>>>>>>> account.default_zone_id, account.default FROM
account WHERE
>>>>>>>>>>>account.id = 1  AND account.removed IS NULL
>>>>>>>>>>>        at
>>>>>>>>>>>
>>>>>>>>>>>com.cloud.utils.db.GenericDaoBase.findById(GenericDaoBase.java:98
>>>>>>>>>>>6)
>>>>>>>>>>>        at
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> com.cloud.utils.component.ComponentInstantiationPostProcessor$In
>>>>>>>>>>> tercept
>>>>>>>>>>> orD
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>ispatcher.intercept(ComponentInstantiationPostProcessor.java:125)
>>>>>>>>>>>        at
>>>>>>>>>>>
>>>>>>>>>>>com.cloud.utils.db.GenericDaoBase.lockRow(GenericDaoBase.java:963
>>>>>>>>>>>)
>>>>>>>>>>>        at
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> com.cloud.utils.component.ComponentInstantiationPostProcessor$In
>>>>>>>>>>> tercept
>>>>>>>>>>> orD
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>ispatcher.intercept(ComponentInstantiationPostProcessor.java:125)
>>>>>>>>>>>        at
>>>>>>>>>>>
>>>>>>>>>>>com.cloud.utils.db.GenericDaoBase.findById(GenericDaoBase.java:92
>>>>>>>>>>>6)
>>>>>>>>>>>        at
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> com.cloud.utils.component.ComponentInstantiationPostProcessor$In
>>>>>>>>>>> tercept
>>>>>>>>>>> orD
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>ispatcher.intercept(ComponentInstantiationPostProcessor.java:125)
>>>>>>>>>>>        at
>>>>>>>>>>>
>>>>>>>>>>>com.cloud.dao.EntityManagerImpl.findById(EntityManagerImpl.java:4
>>>>>>>>>>>5)
>>>>>>>>>>>        at
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> org.apache.cloudstack.context.CallContext.register(CallContext.j
>>>>>>>>>>> ava:166
>>>>>>>>>>> )
>>>>>>>>>>>
>>>>>>>>>>>        at
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> org.apache.cloudstack.context.CallContext.registerSystemCallCont
>>>>>>>>>>> extOnce
>>>>>>>>>>> Onl
>>>>>>>>>>>
>>>>>>>>>>> y(CallContext.java:141)
>>>>>>>>>>>        at
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> org.apache.cloudstack.context.CallContextListener.onEnterContext
>>>>>>>>>>> (CallCo
>>>>>>>>>>> nte
>>>>>>>>>>>
>>>>>>>>>>> xtListener.java:36)
>>>>>>>>>>>        at
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext
>>>>>>>>>>> .callWi
>>>>>>>>>>> thC
>>>>>>>>>>>
>>>>>>>>>>> ontext(DefaultManagedContext.java:83)
>>>>>>>>>>>        at
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext
>>>>>>>>>>> .runWit
>>>>>>>>>>> hCo
>>>>>>>>>>>
>>>>>>>>>>> ntext(DefaultManagedContext.java:53)
>>>>>>>>>>>        at
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> org.apache.cloudstack.managed.context.ManagedContextRunnable.run
>>>>>>>>>>> (Manage
>>>>>>>>>>> dCo
>>>>>>>>>>>
>>>>>>>>>>> ntextRunnable.java:46)
>>>>>>>>>>>        at
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> org.apache.cloudstack.managed.context.ManagedContextTimerTask.ru
>>>>>>>>>>> n(Manag
>>>>>>>>>>> edC
>>>>>>>>>>>
>>>>>>>>>>> ontextTimerTask.java:27)
>>>>>>>>>>>        at java.util.TimerThread.mainLoop(Timer.java:534)
>>>>>>>>>>>        at java.util.TimerThread.run(Timer.java:484)
>>>>>>>>>>> Caused by:
>>>>>>>>>>>com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException:
>>>>>>>>>>> Unknown column 'account.default' in 'field list'
>>>>>>>>>>>        at
>>>>>>>>>>> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>>>>>>>>>>> Method)
>>>>>>>>>>>        at
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeCons
>>>>>>>>>>> tructor
>>>>>>>>>>> Acc
>>>>>>>>>>>
>>>>>>>>>>> essorImpl.java:57)
>>>>>>>>>>>        at
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Delega
>>>>>>>>>>> tingCon
>>>>>>>>>>> str
>>>>>>>>>>>
>>>>>>>>>>> uctorAccessorImpl.java:45)
>>>>>>>>>>>        at
>>>>>>>>>>> java.lang.reflect.Constructor.newInstance(Constructor.java:532)
>>>>>>>>>>>        at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
>>>>>>>>>>>        at com.mysql.jdbc.Util.getInstance(Util.java:386)
>>>>>>>>>>>        at
>>>>>>>>>>> com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1053)
>>>>>>>>>>>        at
>>>>>>>>>>>com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4074)
>>>>>>>>>>>        at
>>>>>>>>>>>com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4006)
>>>>>>>>>>>        at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2468)
>>>>>>>>>>>        at
>>>>>>>>>>>com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2629)
>>>>>>>>>>>        at
>>>>>>>>>>> com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2719)
>>>>>>>>>>>        at
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStateme
>>>>>>>>>>> nt.java
>>>>>>>>>>> :21
>>>>>>>>>>>
>>>>>>>>>>> 55)
>>>>>>>>>>>        at
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.
>>>>>>>>>>> java:23
>>>>>>>>>>> 18)
>>>>>>>>>>>
>>>>>>>>>>>        at
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery
>>>>>>>>>>> (Delega
>>>>>>>>>>> tin
>>>>>>>>>>>
>>>>>>>>>>> gPreparedStatement.java:96)
>>>>>>>>>>>        at
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery
>>>>>>>>>>> (Delega
>>>>>>>>>>> tin
>>>>>>>>>>>
>>>>>>>>>>> gPreparedStatement.java:96)
>>>>>>>>>>>        at
>>>>>>>>>>>
>>>>>>>>>>>com.cloud.utils.db.GenericDaoBase.findById(GenericDaoBase.java:98
>>>>>>>>>>>3)
>>>>>>>>>>>        ... 27 more
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Francois Gaudreault
>>>>>>>>>>> Architecte de Solution Cloud | Cloud Solutions
Architect
>>>>>>>>>>> fgaudreault@cloudops.com
>>>>>>>>>>> 514-629-6775
>>>>>>>>>>> - - -
>>>>>>>>>>> CloudOps
>>>>>>>>>>> 420 rue Guy
>>>>>>>>>>> Montréal QC  H3J 1S6
>>>>>>>>>>> www.cloudops.com
>>>>>>>>>>> @CloudOps_
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>> --
>>>>>>>> Francois Gaudreault
>>>>>>>> Architecte de Solution Cloud | Cloud Solutions Architect
>>>>>>>> fgaudreault@cloudops.com
>>>>>>>> 514-629-6775
>>>>>>>> - - -
>>>>>>>> CloudOps
>>>>>>>> 420 rue Guy
>>>>>>>> Montréal QC  H3J 1S6
>>>>>>>> www.cloudops.com
>>>>>>>> @CloudOps_
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Francois Gaudreault
>>>>>> Architecte de Solution Cloud | Cloud Solutions Architect
>>>>>> fgaudreault@cloudops.com
>>>>>> 514-629-6775
>>>>>> - - -
>>>>>> CloudOps
>>>>>> 420 rue Guy
>>>>>> Montréal QC  H3J 1S6
>>>>>> www.cloudops.com
>>>>>> @CloudOps_
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Francois Gaudreault
>>>> Architecte de Solution Cloud | Cloud Solutions Architect
>>>>fgaudreault@cloudops.com
>>>> 514-629-6775
>>>> - - -
>>>> CloudOps
>>>> 420 rue Guy
>>>> Montréal QC  H3J 1S6
>>>> www.cloudops.com
>>>> @CloudOps_
>>>>
>

Mime
View raw message