cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Diego Spinola Castro <spinolacas...@gmail.com>
Subject Re: NetworkUsage
Date Wed, 26 Mar 2014 12:41:20 GMT
Hi guys, just bumping up this issue.


2014-03-18 15:01 GMT-03:00 Alena Prokharchyk <Alena.Prokharchyk@citrix.com>:

>  Ok, backend reports it correctly. So here is the sequence for the
> problem:
>
>
>    1. Backend - VR - report the networking data to the CS MS
>    2. Networking data never gets inserted to the cloud_usage DB.
>
>
>  Kishan, can you please help to figure out why Usage server doesn't
> insert this data to cloud_usage DB?
>
>  Thanks,
> Alena.
>
>   From: Diego Spinola Castro <spinolacastro@gmail.com>
> Date: Tuesday, March 18, 2014 at 10:29 AM
>
> To: Alena Prokharchyk <alena.prokharchyk@citrix.com>
> Cc: "users@cloudstack.apache.org" <users@cloudstack.apache.org>, Kishan
> Kavala <Kishan.Kavala@citrix.com>
> Subject: Re: NetworkUsage
>
>   Take a look at: http://www.fpaste.org/86434/51637081/
>
>
> 2014-03-18 14:15 GMT-03:00 Alena Prokharchyk <Alena.Prokharchyk@citrix.com
> >:
>
>>  Diego,
>>
>>  Check for these messages in the CS log:
>>
>>
>> [{"com.cloud.agent.api.NetworkUsageAnswer":{"routerName":"r-3-VM","bytesSen
>> t":0,"bytesReceived":0,"result":true,"details":"success","wait":0}}] }
>>
>>
>>  And see if you are seeing non-0 bytesReceived/bytesSent reported from
>> the backend for any day after 2013-11-08
>>
>>  -Alena.
>>
>>   From: Diego Spinola Castro <spinolacastro@gmail.com>
>>  Date: Tuesday, March 18, 2014 at 10:12 AM
>>
>> To: Alena Prokharchyk <alena.prokharchyk@citrix.com>
>> Cc: "users@cloudstack.apache.org" <users@cloudstack.apache.org>, Kishan
>> Kavala <Kishan.Kavala@citrix.com>
>> Subject: Re: NetworkUsage
>>
>>   Hi Alena, there's no records after 2013-11-08 (when i did the upgrade
>> to CS 4.2).
>>
>>
>>  If there are no records for Feb2 in the DB, can you check if CS
>> reported any usage during this day? How do i do that?
>>
>>
>> 2014-03-18 13:45 GMT-03:00 Alena Prokharchyk <
>> Alena.Prokharchyk@citrix.com>:
>>
>>>  Ok, so having 0 for some router records is actually fine as either the
>>> Vrs are stopped, or there are no vms. Good you've confirmed that's not the
>>> problem.
>>>
>>>  Now we have to debug something different - API not returning certain
>>> types of usage. For that, you have to:
>>>
>>>
>>>    1. go to cloud_usage DB
>>>    2. Select * from cloudusage where usage_type in (4,5)
>>>
>>> And see if there are any records for the Feb 2 are present. If there
>>> are, then its an API problem - when data is present in the DB, but API
>>> doesn't return it.
>>>
>>>  If there are no records for Feb2 in the DB, can you check if CS
>>> reported any usage during this day?
>>>
>>>  Thanks,
>>> Alena.
>>>
>>>   From: Diego Spinola Castro <spinolacastro@gmail.com>
>>>  Date: Tuesday, March 18, 2014 at 8:46 AM
>>> To: Alena Prokharchyk <alena.prokharchyk@citrix.com>
>>> Cc: "users@cloudstack.apache.org" <users@cloudstack.apache.org>, Kishan
>>> Kavala <Kishan.Kavala@citrix.com>
>>> Subject: Re: NetworkUsage
>>>
>>>   HI Alena, thanks for the answer. Let me explain better my case.
>>> Regarding those accounts , i have a lot of them without any instance
>>> running, there's also some vrouters of that list which are even stopped.
>>> I'm trying get the traffic count from usage api, i don't know why but
>>> all usage type 4 is zero, there's the commands i'm running:
>>>
>>>  Trying get UsageType=4
>>> curl "
>>> http://localhost:8096/client/api?command=listUsageRecords&startdate=2014-02-01&enddate=2014-02-01&type=4
>>> "
>>>  Neither Type 4 nor 5 are returning.
>>>
>>>  This used to work before upgrade to 4.2.
>>> BTW i have only one management server.
>>>
>>>
>>>
>>>
>>> 2014-03-17 20:54 GMT-03:00 Alena Prokharchyk <
>>> Alena.Prokharchyk@citrix.com>:
>>>
>>>> Diego, spent some more time on debugging your issue. Might not be
>>>> related
>>>> to transaction refactoring as it went in after 4.2. And records for some
>>>> accounts actually got non-0 network stat:
>>>>
>>>> 0 network stats:
>>>>
>>>> mysql> select count(*) from user_statistics where net_bytes_received=0
>>>> or
>>>> net_bytes_sent=0 order by account_id;
>>>> +----------+
>>>> | count(*) |
>>>> +----------+
>>>> |       26 |
>>>> +----------+
>>>> 1 row in set (0.00 sec)
>>>>
>>>> Non-0 network stats
>>>>
>>>> mysql> select count(*) from user_statistics where net_bytes_received!=0
>>>> or
>>>> net_bytes_sent!=0 order by account_id;
>>>> +----------+
>>>> | count(*) |
>>>> +----------+
>>>> |      207 |
>>>> +----------+
>>>> 1 row in set (0.00 sec)
>>>>
>>>>
>>>>
>>>> Adding Kishan to the thread as he has more expertise in Network usage
>>>> calculation and he might be able to help us.
>>>>
>>>>
>>>> I would divide problematic accounts (Networks) that are not getting
>>>> usage
>>>> stats update, into 2 categories:
>>>>
>>>> CATEGORY #1 Networks having non-0 current_bytes* and 0 net_bytes
>>>> received
>>>>
>>>> This situation is totally normal as net_bytes_received get updated only
>>>> when VR gets stopped. Yet on Diego's setup, those entries are not
>>>> counted
>>>> when do network usage calculation. Here is the account having problems:
>>>>
>>>> mysql> select * from user_statistics where account_id=176;
>>>>
>>>> +-----+----------------+------------+-------------------+-----------+------
>>>>
>>>> --------+------------+--------------------+----------------+---------------
>>>> ---------+--------------------+--------------------+----------------+
>>>> | id  | data_center_id | account_id | public_ip_address | device_id |
>>>> device_type  | network_id | net_bytes_received | net_bytes_sent |
>>>> current_bytes_received | current_bytes_sent | agg_bytes_received |
>>>> agg_bytes_sent |
>>>>
>>>> +-----+----------------+------------+-------------------+-----------+------
>>>>
>>>> --------+------------+--------------------+----------------+---------------
>>>> ---------+--------------------+--------------------+----------------+
>>>> | 228 |              1 |        176 | NULL              |      1193 |
>>>> DomainRouter |        366 |                  0 |              0 |
>>>>    4271398427 |         2014478889 |                  0 |
>>>>  0 |
>>>>
>>>> +-----+----------------+------------+-------------------+-----------+------
>>>>
>>>> --------+------------+--------------------+----------------+---------------
>>>> ---------+--------------------+--------------------+----------------+
>>>> 1 row in set (0.00 sec)
>>>>
>>>>
>>>>
>>>> And here is the Usage log:
>>>>
>>>>
>>>> 1. 2014-03-17 03:15:02,779 DEBUG [usage.parser.NetworkUsageParser]
>>>> (Usage-Job-1:null) Parsing all Network usage events for account: 176
>>>>
>>>> 2. 2014-03-17 03:15:02,779 DEBUG [usage.parser.NetworkUsageParser]
>>>> (Usage-Job-1:null) No usage record (0 bytes used) generated for account:
>>>> 176
>>>>
>>>>
>>>>
>>>>
>>>> Kishan, is it by design not to count current_bytes in network usage? Can
>>>> you please double check the code, because it doesn't look right to me as
>>>> the VR may not go into Stopped state for a while, yet we have to
>>>> calculate
>>>> the usage
>>>>
>>>>
>>>> CATEGORY #2
>>>>
>>>> Accounts (networks) having all 0s for
>>>>
>>>> net_bytes_received/net_bytes_sent/current_bytes_received/current_bytes_sent
>>>>
>>>> Diego, for those accounts we have to figure out whether the backend
>>>> reporting any usage for the Vrs. To debug that, you have to:
>>>>
>>>> A) get all the VR names
>>>>
>>>> mysql> select name from vm_instance where removed is null and
>>>> vm_type='DomainRouter' and account_id in (select account_id from
>>>> user_statistics where net_bytes_received = 0 and net_bytes_sent=0 and
>>>> current_bytes_received=0 and curre
>>>> +-----------+
>>>> | name      |
>>>> +-----------+
>>>> | r-621-VM  |
>>>> | r-651-VM  |
>>>> | r-658-VM  |
>>>> | r-671-VM  |
>>>> | r-682-VM  |
>>>> | r-886-VM  |
>>>> | r-892-VM  |
>>>> | r-915-VM  |
>>>> | r-934-VM  |
>>>> | r-935-VM  |
>>>> | r-986-VM  |
>>>> | r-1002-VM |
>>>> | r-1004-VM |
>>>> | r-1030-VM |
>>>> | r-1148-VM |
>>>> | r-1205-VM |
>>>> | r-1221-VM |
>>>> +-----------+
>>>>
>>>>
>>>> B) Search for NetworkUsageAnswer and routerName in the log to locate
>>>> statements like that. Do it on your today's log and the log(s) from
>>>> couple
>>>> of days before. You might also want to run some traffic
>>>>
>>>>
>>>> 2014-03-17 14:41:12,661 DEBUG [agent.transport.Request]
>>>> (DirectAgent-15:null) Seq 1-281411626: Processing:  { Ans: , MgmtId:
>>>> 4278190080, via: 1, Ver: v1, Flags: 10,
>>>>
>>>> [{"com.cloud.agent.api.NetworkUsageAnswer":{"routerName":"r-3-VM","bytesSen
>>>> t":0,"bytesReceived":0,"result":true,"details":"success","wait":0}}] }
>>>>
>>>>
>>>> If you see that the backend reports non-0 value, it means that the
>>>> Management server fails to save it. If the backend reports 0, and yet
>>>> you
>>>> are sure that there is a network traffic generated on the routers above,
>>>> there might be some error in a way we collect the traffic on the VR.
>>>>
>>>>
>>>> Also how many management servers do you have in your setup?
>>>>
>>>>
>>>> -Alena.
>>>>
>>>>
>>>>
>>>> On 3/17/14, 12:05 PM, "Alena Prokharchyk" <Alena.Prokharchyk@citrix.com
>>>> >
>>>> wrote:
>>>>
>>>> >Diego, please go ahead and file a bug with Priority Critical against
>>>> CS.
>>>> >Looks like there is some bug in a way to commit db statement in
>>>> >transaction resulting in the DB not being updated:
>>>> >
>>>> >2014-03-17 13:14:15,431 TRACE [db.Transaction.Transaction]
>>>> >(RouterMonitor-1:null) txn: Not committing because transaction started
>>>> >elsewhere:
>>>>
>>>> >-GenericDaoBase.update:814-ComponentInstantiationPostProcessor$Interceptor
>>>>
>>>> >Dispatcher.intercept:125-GenericDaoBase.update:774-ComponentInstantiationP
>>>>
>>>> >ostProcessor$InterceptorDispatcher.intercept:125-GenericDaoBase.update:126
>>>>
>>>> >2-ComponentInstantiationPostProcessor$InterceptorDispatcher.intercept:125-
>>>>
>>>> >VirtualNetworkApplianceManagerImpl$NetworkUsageTask.run:948-Executors$Runn
>>>>
>>>> >ableAdapter.call:471-FutureTask$Sync.innerRunAndReset:351-FutureTask.runAn
>>>>
>>>> >dReset:178-ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201:165-
>>>>  >ScheduledThreadPoolExecutor$ScheduledFutureTask.run:267 /
>>>>
>>>> >-VirtualNetworkApplianceManagerImpl$NetworkUsageTask.run:900-Executors$Run
>>>>
>>>> >nableAdapter.call:471-FutureTask$Sync.innerRunAndReset:351-FutureTask.runA
>>>> >ndReset:178-ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201:165
>>>>
>>>> >-ScheduledThreadPoolExecutor$ScheduledFutureTask.run:267-ThreadPoolExecuto
>>>> >r.runWorker:1146-ThreadPoolExecutor$Worker.run:615-Thread.run:701 :
>>>> >update, update, update,
>>>>
>>>> >-VirtualNetworkApplianceManagerImpl$NetworkUsageTask.run:900-Executors$Run
>>>>
>>>> >nableAdapter.call:471-FutureTask$Sync.innerRunAndReset:351-FutureTask.runA
>>>> >ndReset:178-ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201:165
>>>>
>>>> >-ScheduledThreadPoolExecutor$ScheduledFutureTask.run:267-ThreadPoolExecuto
>>>> >r.runWorker:1146-ThreadPoolExecutor$Worker.run:615-Thread.run:701,
>>>> >
>>>> >
>>>> > Attach the latest log from this email to the bug.
>>>> >
>>>> >Unfortunately, there is no way to workaround it on your side as its a
>>>> >code issue.
>>>> >
>>>> >-Alena.
>>>> >
>>>> >From: Diego Spinola Castro
>>>>  ><spinolacastro@gmail.com<mailto:spinolacastro@gmail.com>>
>>>> >Date: Monday, March 17, 2014 at 11:25 AM
>>>> >To: Alena Prokharchyk
>>>>  ><alena.prokharchyk@citrix.com<mailto:alena.prokharchyk@citrix.com>>
>>>> >Cc: "users@cloudstack.apache.org<mailto:users@cloudstack.apache.org>"
>>>> ><users@cloudstack.apache.org<mailto:users@cloudstack.apache.org>>
>>>> >Subject: Re: NetworkUsage
>>>> >
>>>> >Hi, sorry.  i went lost with this huge log.
>>>> >
>>>> >
>>>> >
>>>> >2014-03-17 15:10 GMT-03:00 Alena Prokharchyk
>>>>  ><Alena.Prokharchyk@citrix.com<mailto:Alena.Prokharchyk@citrix.com>>:
>>>> >Diego, the log you've provided, doesn't contain "Rolling back the
>>>> >transaction" statement. You should wait for this particular error to
>>>> >happen (while logging on TRACE), and then take a log snippet.
>>>> >
>>>> >From: Diego Spinola Castro
>>>>  ><spinolacastro@gmail.com<mailto:spinolacastro@gmail.com>>
>>>> >Date: Monday, March 17, 2014 at 11:00 AM
>>>> >To: Alena Prokharchyk
>>>>  ><alena.prokharchyk@citrix.com<mailto:alena.prokharchyk@citrix.com>>
>>>> >Cc: "users@cloudstack.apache.org<mailto:users@cloudstack.apache.org>"
>>>> ><users@cloudstack.apache.org<mailto:users@cloudstack.apache.org>>
>>>> >Subject: Re: NetworkUsage
>>>> >
>>>> >Hi, Alena. thanks for the tips, i got some logs:
>>>> >http://www.fpaste.org/86141/13950791/
>>>> >
>>>> >
>>>> >2014-03-17 14:27 GMT-03:00 Alena Prokharchyk
>>>>  ><Alena.Prokharchyk@citrix.com<mailto:Alena.Prokharchyk@citrix.com>>:
>>>>  >Diego,
>>>> >
>>>> >Unfortunately CS lock lack the needed info for debugging the problem.
I
>>>> >see only transaction roll back statement:
>>>> >
>>>> >2014-03-17 06:01:22,363 DEBUG [db.Transaction.Transaction]
>>>> >(RouterMonitor-1:null) Rolling back the transaction: Time = 2 Name =
>>>>
>>>> >-VirtualNetworkApplianceManagerImpl$NetworkUsageTask.run:900-Executors$Run
>>>> >n
>>>>
>>>> >ableAdapter.call:471-FutureTask$Sync.innerRunAndReset:351-FutureTask.runAn
>>>> >d
>>>>
>>>> >Reset:178-ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201:165-
>>>> >
>>>> >
>>>> >To debug it further, you need to change the log level from DEBUG to
>>>> TRACE
>>>> >for a certain class by modifying log4 properties (no MS restart is
>>>> >required)
>>>> >
>>>> >
>>>> >Here is the link on how to do it:
>>>> >
>>>> >
>>>> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Useful+Tips+and+Tri
>>>> >c
>>>> >ks, "How to enable trace logging for a particular class in CloudStack?²
>>>> >section. Replace category name with:
>>>> >
>>>> ><category name=³com.cloud.utils.db">
>>>> >     <priority value=³TRACE"/>
>>>> >   </category>
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >After the change is done, wait till you see TRACE statements in the
>>>> log,
>>>> >and till the error "Rolling back the transaction" occurs again. Take
>>>> the
>>>> >log snippet and send it out. After you are done with that, you will
>>>> need
>>>> >to remove TRACE config from log4 properties file, otherwise your logs
>>>> will
>>>> >grow huge.
>>>> >
>>>> >-Alena.
>>>> >
>>>> >
>>>> >On 3/17/14, 9:48 AM, "Diego Spinola Castro"
>>>>  ><spinolacastro@gmail.com<mailto:spinolacastro@gmail.com>>
>>>>  >wrote:
>>>> >
>>>> >>Hi guys, i've been running into a issue when getting network traffic
>>>> from
>>>> >>usage server. All records are Zero, i have a CS 4.2.0 install.
>>>> >>
>>>> >>Here the usage.logs http://www.fpaste.org/86116/13950748/
>>>> >>And management-server.log http://www.fpaste.org/86118/07486613/
>>>> >>
>>>> >>Does anyone know what is wrong ?
>>>> >
>>>> >
>>>> >
>>>>
>>>>
>>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message