ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ksenia Rybakova (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (IGNITE-5707) Client can't resume streaming even after topology got stable during load test
Date Thu, 06 Jul 2017 11:53:00 GMT

     [ https://issues.apache.org/jira/browse/IGNITE-5707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ksenia Rybakova updated IGNITE-5707:
------------------------------------
    Description: 
Load test config:
- CacheRandomOperationBenchmark
- 8 clients, 48 servers at 8 hosts
- 26 physical caches of different types with different memory policies + 30 groups with 10
partitioned caches each + 20 groups with 10 replicated caches each. Total 526 caches.
- Preloading amount: 50K, key range: 60K
Complete configs are attached.

3 of 8 clients have following messages during preloading:
{noformat}
[12:17:56] (err) Failed to execute compound future reducer: GridCompoundFuture [rdc=null,
initFlag=1, lsnrCalls=0, done=false, cancelled=false, err=null, futs=[true, false, false]][12:17:56]
(err) Failed to
execute compound future reducer: GridCompoundFuture [rdc=null, initFlag=1, lsnrCalls=0, done=false,
cancelled=false, err=null, futs=[true, true, false, false, false, false, false, false, false,
false, false,
 false, false, false, false, false, false, false, false, false, false, false, false, false,
false, false, false, false, false, false, false, false, false, false, false, false, false]][12:17:56]
(err) Failed
to execute compound future reducer: GridCompoundFuture [rdc=null, initFlag=1, lsnrCalls=0,
done=false, cancelled=false, err=null, futs=[true, true, false, false, false, false, false,
false, false, false, fal
se, false, false, false, false, false, false, false, false, false, false, false, false, false,
false, false, false, false, false, false, false, false, false, false, false, false, false]]class
org.apache.igni
te.IgniteCheckedException: DataStreamer request failed [node=16a20d0c-4009-4bfa-ad6e-0261d9e3b2a3]
        at org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl$Buffer.onResponse(DataStreamerImpl.java:1785)
        at org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl$3.onMessage(DataStreamerImpl.java:333)
        at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556)
        at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1184)
        at org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:126)
        at org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1097)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: class org.apache.ignite.IgniteCheckedException: DataStreamer will retry data transfer
at stable topology [reqTop=AffinityTopologyVersion [topVer=56, minorTopVer=0], topVer=AffinityTopologyVersion
[topVer=56, minorTopVer=1], node=remote]
        at org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.localUpdate(DataStreamProcessor.java:343)
        at org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.processRequest(DataStreamProcessor.java:301)
        at org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.access$000(DataStreamProcessor.java:58)
        at org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor$1.onMessage(DataStreamProcessor.java:88)
        ... 7 more
{noformat}
2 drivers were able to resume streaming after some time, but 1 didn't (error messages continued
to be printed). This driver had high heap utilization, that resulted in long GC pause. Finally
it was considered failed by other nodes.

Also all clients have in log:
{noformat}
[2017-07-06 12:18:11,780][INFO ][exchange-worker-#41%null%][time] Started exchange init [topVer=AffinityTopologyVersion
[topVer=56, minorTopVer=1], crd=false, evt=18, node=TcpDiscoveryNode [id=71174cde-6bcf-43b6-a97b-02e0c987a8da,
addrs=[127.0.0.1, 172.25.1.31], sockAddrs=[testagent01.gridgain.local/172.25.1.31:0, /127.0.0.1:0],
discPort=0, order=49, intOrder=0, lastExchangeTime=1499332605385, loc=true, ver=2.1.0#20170705-sha1:ad42f620,
isClient=true], evtNode=TcpDiscoveryNode [id=71174cde-6bcf-43b6-a97b-02e0c987a8da, addrs=[127.0.0.1,
172.25.1.31], sockAddrs=[testagent01.gridgain.local/172.25.1.31:0, /127.0.0.1:0], discPort=0,
order=49, intOrder=0, lastExchangeTime=1499332605385, loc=true, ver=2.1.0#20170705-sha1:ad42f620,
isClient=true], customEvt=CacheAffinityChangeMessage [id=b5a8f271d51-f9bb8d96-c609-4de4-b32f-761c2a33ad10,
topVer=AffinityTopologyVersion [topVer=48, minorTopVer=0], exchId=null, partsMsg=null, exchangeNeeded=true]]
[2017-07-06 12:18:12,284][WARN ][pool-5-thread-2][GridDhtPartitionTopologyImpl] Requested
topology version does not match calculated diff, will require full iteration tocalculate mapping
[topVer=AffinityTopologyVersion [topVer=56, minorTopVer=0], diffVer=AffinityTopologyVersion
[topVer=56, minorTopVer=1]]
[2017-07-06 12:18:12,642][WARN ][sys-#56%null%][GridDhtPartitionTopologyImpl] Requested topology
version does not match calculated diff, will require full iteration tocalculate mapping [topVer=AffinityTopologyVersion
[topVer=56, minorTopVer=1], diffVer=AffinityTopologyVersion [topVer=56, minorTopVer=0]]
[2017-07-06 12:18:12,919][INFO ][exchange-worker-#41%null%][time] Finished exchange init [topVer=AffinityTopologyVersion
[topVer=56, minorTopVer=1], crd=false]
{noformat}

  was:
Load test config:
- CacheRandomOperationBenchmark
- 8 clients, 48 servers at 8 hosts
- 26 physical caches of different types with different memory policies + 30 groups with 10
partitioned caches each + 20 groups with 10 replicated caches each. Total 526 caches.
- Preloading amount: 50K, key range: 60K
Complete configs are attached.

3 of 8 clients have following messages during preloading:
{noformat}
[12:17:56] (err) Failed to execute compound future reducer: GridCompoundFuture [rdc=null,
initFlag=1, lsnrCalls=0, done=false, cancelled=false, err=null, futs=[true, false, false]][12:17:56]
(err) Failed to
execute compound future reducer: GridCompoundFuture [rdc=null, initFlag=1, lsnrCalls=0, done=false,
cancelled=false, err=null, futs=[true, true, false, false, false, false, false, false, false,
false, false,
 false, false, false, false, false, false, false, false, false, false, false, false, false,
false, false, false, false, false, false, false, false, false, false, false, false, false]][12:17:56]
(err) Failed
to execute compound future reducer: GridCompoundFuture [rdc=null, initFlag=1, lsnrCalls=0,
done=false, cancelled=false, err=null, futs=[true, true, false, false, false, false, false,
false, false, false, fal
se, false, false, false, false, false, false, false, false, false, false, false, false, false,
false, false, false, false, false, false, false, false, false, false, false, false, false]]class
org.apache.igni
te.IgniteCheckedException: DataStreamer request failed [node=16a20d0c-4009-4bfa-ad6e-0261d9e3b2a3]
        at org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl$Buffer.onResponse(DataStreamerImpl.java:1785)
        at org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl$3.onMessage(DataStreamerImpl.java:333)
        at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556)
        at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1184)
        at org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:126)
        at org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1097)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: class org.apache.ignite.IgniteCheckedException: DataStreamer will retry data transfer
at stable topology [reqTop=AffinityTopologyVersion [topVer=56, minorTopVer=0], topVer=AffinityTopologyVersion
[topVer=56, minorTopVer=1], node=remote]
        at org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.localUpdate(DataStreamProcessor.java:343)
        at org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.processRequest(DataStreamProcessor.java:301)
        at org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.access$000(DataStreamProcessor.java:58)
        at org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor$1.onMessage(DataStreamProcessor.java:88)
        ... 7 more
{noformat}
2 drivers were able to resume streaming after some time, but 1 didn't (error messages continued
to be printed). This driver had high heap utilization, that resulted in long GC pause. Finally
it was considered failed by other nodes.

Also it was observed that all clients have in log:
{noformat}
[2017-07-06 12:18:11,780][INFO ][exchange-worker-#41%null%][time] Started exchange init [topVer=AffinityTopologyVersion
[topVer=56, minorTopVer=1], crd=false, evt=18, node=TcpDiscoveryNode [id=71174cde-6bcf-43b6-a97b-02e0c987a8da,
addrs=[127.0.0.1, 172.25.1.31], sockAddrs=[testagent01.gridgain.local/172.25.1.31:0, /127.0.0.1:0],
discPort=0, order=49, intOrder=0, lastExchangeTime=1499332605385, loc=true, ver=2.1.0#20170705-sha1:ad42f620,
isClient=true], evtNode=TcpDiscoveryNode [id=71174cde-6bcf-43b6-a97b-02e0c987a8da, addrs=[127.0.0.1,
172.25.1.31], sockAddrs=[testagent01.gridgain.local/172.25.1.31:0, /127.0.0.1:0], discPort=0,
order=49, intOrder=0, lastExchangeTime=1499332605385, loc=true, ver=2.1.0#20170705-sha1:ad42f620,
isClient=true], customEvt=CacheAffinityChangeMessage [id=b5a8f271d51-f9bb8d96-c609-4de4-b32f-761c2a33ad10,
topVer=AffinityTopologyVersion [topVer=48, minorTopVer=0], exchId=null, partsMsg=null, exchangeNeeded=true]]
[2017-07-06 12:18:12,284][WARN ][pool-5-thread-2][GridDhtPartitionTopologyImpl] Requested
topology version does not match calculated diff, will require full iteration tocalculate mapping
[topVer=AffinityTopologyVersion [topVer=56, minorTopVer=0], diffVer=AffinityTopologyVersion
[topVer=56, minorTopVer=1]]
[2017-07-06 12:18:12,642][WARN ][sys-#56%null%][GridDhtPartitionTopologyImpl] Requested topology
version does not match calculated diff, will require full iteration tocalculate mapping [topVer=AffinityTopologyVersion
[topVer=56, minorTopVer=1], diffVer=AffinityTopologyVersion [topVer=56, minorTopVer=0]]
[2017-07-06 12:18:12,919][INFO ][exchange-worker-#41%null%][time] Finished exchange init [topVer=AffinityTopologyVersion
[topVer=56, minorTopVer=1], crd=false]
{noformat}


> Client can't resume streaming even after topology got stable during load test
> -----------------------------------------------------------------------------
>
>                 Key: IGNITE-5707
>                 URL: https://issues.apache.org/jira/browse/IGNITE-5707
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 2.1
>            Reporter: Ksenia Rybakova
>
> Load test config:
> - CacheRandomOperationBenchmark
> - 8 clients, 48 servers at 8 hosts
> - 26 physical caches of different types with different memory policies + 30 groups with
10 partitioned caches each + 20 groups with 10 replicated caches each. Total 526 caches.
> - Preloading amount: 50K, key range: 60K
> Complete configs are attached.
> 3 of 8 clients have following messages during preloading:
> {noformat}
> [12:17:56] (err) Failed to execute compound future reducer: GridCompoundFuture [rdc=null,
initFlag=1, lsnrCalls=0, done=false, cancelled=false, err=null, futs=[true, false, false]][12:17:56]
(err) Failed to
> execute compound future reducer: GridCompoundFuture [rdc=null, initFlag=1, lsnrCalls=0,
done=false, cancelled=false, err=null, futs=[true, true, false, false, false, false, false,
false, false, false, false,
>  false, false, false, false, false, false, false, false, false, false, false, false,
false, false, false, false, false, false, false, false, false, false, false, false, false,
false]][12:17:56] (err) Failed
> to execute compound future reducer: GridCompoundFuture [rdc=null, initFlag=1, lsnrCalls=0,
done=false, cancelled=false, err=null, futs=[true, true, false, false, false, false, false,
false, false, false, fal
> se, false, false, false, false, false, false, false, false, false, false, false, false,
false, false, false, false, false, false, false, false, false, false, false, false, false,
false]]class org.apache.igni
> te.IgniteCheckedException: DataStreamer request failed [node=16a20d0c-4009-4bfa-ad6e-0261d9e3b2a3]
>         at org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl$Buffer.onResponse(DataStreamerImpl.java:1785)
>         at org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl$3.onMessage(DataStreamerImpl.java:333)
>         at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556)
>         at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1184)
>         at org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:126)
>         at org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1097)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: class org.apache.ignite.IgniteCheckedException: DataStreamer will retry data
transfer at stable topology [reqTop=AffinityTopologyVersion [topVer=56, minorTopVer=0], topVer=AffinityTopologyVersion
> [topVer=56, minorTopVer=1], node=remote]
>         at org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.localUpdate(DataStreamProcessor.java:343)
>         at org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.processRequest(DataStreamProcessor.java:301)
>         at org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.access$000(DataStreamProcessor.java:58)
>         at org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor$1.onMessage(DataStreamProcessor.java:88)
>         ... 7 more
> {noformat}
> 2 drivers were able to resume streaming after some time, but 1 didn't (error messages
continued to be printed). This driver had high heap utilization, that resulted in long GC
pause. Finally it was considered failed by other nodes.
> Also all clients have in log:
> {noformat}
> [2017-07-06 12:18:11,780][INFO ][exchange-worker-#41%null%][time] Started exchange init
[topVer=AffinityTopologyVersion [topVer=56, minorTopVer=1], crd=false, evt=18, node=TcpDiscoveryNode
[id=71174cde-6bcf-43b6-a97b-02e0c987a8da, addrs=[127.0.0.1, 172.25.1.31], sockAddrs=[testagent01.gridgain.local/172.25.1.31:0,
/127.0.0.1:0], discPort=0, order=49, intOrder=0, lastExchangeTime=1499332605385, loc=true,
ver=2.1.0#20170705-sha1:ad42f620, isClient=true], evtNode=TcpDiscoveryNode [id=71174cde-6bcf-43b6-a97b-02e0c987a8da,
addrs=[127.0.0.1, 172.25.1.31], sockAddrs=[testagent01.gridgain.local/172.25.1.31:0, /127.0.0.1:0],
discPort=0, order=49, intOrder=0, lastExchangeTime=1499332605385, loc=true, ver=2.1.0#20170705-sha1:ad42f620,
isClient=true], customEvt=CacheAffinityChangeMessage [id=b5a8f271d51-f9bb8d96-c609-4de4-b32f-761c2a33ad10,
topVer=AffinityTopologyVersion [topVer=48, minorTopVer=0], exchId=null, partsMsg=null, exchangeNeeded=true]]
> [2017-07-06 12:18:12,284][WARN ][pool-5-thread-2][GridDhtPartitionTopologyImpl] Requested
topology version does not match calculated diff, will require full iteration tocalculate mapping
[topVer=AffinityTopologyVersion [topVer=56, minorTopVer=0], diffVer=AffinityTopologyVersion
[topVer=56, minorTopVer=1]]
> [2017-07-06 12:18:12,642][WARN ][sys-#56%null%][GridDhtPartitionTopologyImpl] Requested
topology version does not match calculated diff, will require full iteration tocalculate mapping
[topVer=AffinityTopologyVersion [topVer=56, minorTopVer=1], diffVer=AffinityTopologyVersion
[topVer=56, minorTopVer=0]]
> [2017-07-06 12:18:12,919][INFO ][exchange-worker-#41%null%][time] Finished exchange init
[topVer=AffinityTopologyVersion [topVer=56, minorTopVer=1], crd=false]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message