spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jungtaek Lim (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-10816) EventTime based sessionization
Date Thu, 18 Oct 2018 12:43:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16654627#comment-16654627
] 

Jungtaek Lim edited comment on SPARK-10816 at 10/18/18 12:42 PM:
-----------------------------------------------------------------

Just ran another performance test to check my new trial of improving state.

Here I try to overwrite values for given key instead of removing all values and append new
values for given key.
https://github.com/HeartSaVioR/spark/commit/6d466b9f424ae6a2b5a927e650f60ef35cfe30ca

The result was no luck (small performance hit compared to current), hence I would not put
the numbers for that here. But I've run the test from AWS c5d.xlarge with dedicated option,
hence more isolated and stable env. compared to before, which shows higher input rate.

Test Env.: c5d.xlarge, dedicated

A .plenty of sessions

1. HWX (Append Mode) 

1.a. input rate 20000

||batch id||input rows||input rows per second||processed rows per second||
| 21 | 113355 | 20234.7375937 | 19278.0612245 |
| 22 | 118905 | 20218.5002551 | 17958.7675578 |
| 23 | 120000 | 18121.4134703 | 15622.9657597 |
| 24 | 160000 | 20827.9093986 | 14406.6270484 |
| 25 | 220000 | 19807.3287116 | 12593.0165999 |

2. Baidu (Append Mode)

2.a. input rate 15000

||batch id||input rows||input rows per second||processed rows per second||
| 18 | 1005000 | 15068.3699172 | 5993.05878565 |
| 19 | 2505000 | 14937.8335669 | 4823.00254531 |

(cancelled since following batch takes too long... it even can't reach 10000)

3. HWX (Update Mode)

3.a. input rate 15000

||batch id||input rows||input rows per second||processed rows per second||
| 25 | 165000 | 15136.2260343 | 15351.6933383 |
| 26 | 165000 | 15350.2651409 | 28128.196386 |
| 27 | 90000 | 15342.6525742 | 16669.7536581 |
| 28 | 75000 | 13888.8888889 | 13557.483731 |
| 29 | 90000 | 16266.0401229 | 15131.1365165 |
| 30 | 90000 | 15128.5930408 | 13829.1333743 |

3.b. input rate 20000

||batch id||input rows||input rows per second||processed rows per second||
| 23 | 318210 | 19853.3815822 | 20039.6750425 |
| 24 | 320000 | 20151.1335013 | 23456.9711186 |
| 25 | 280000 | 20523.3453053 | 15197.5683891 |

(cancelled since following batch takes too long...)

B. plenty of rows in session

1. HWX (Append Mode)

1.a. input rate 30000

||batch id||input rows||input rows per second||processed rows per second||
| 21 | 295730 | 30210.4402901 | 25682.1537125 |
| 22 | 360000 | 31260.8544634 | 25906.7357513 |
| 23 | 420000 | 30222.3501475 | 28753.337441 |
| 24 | 420000 | 28751.3691128 | 29702.970297 |
| 25 | 420000 | 29700.8698112 | 28561.7137028 |

1.b. input rate 35000

||batch id||input rows||input rows per second||processed rows per second||
| 19 | 441716 | 36073.1727236 | 29971.2308319 |
| 20 | 490000 | 33245.1319628 | 28194.9479257 |
| 21 | 630000 | 36250.647333 | 30189.7642323 |
| 22 | 735000 | 35219.703867 | 28420.0757869 |
| 23 | 910000 | 35185.3999923 | 30372.8179967 |

2. Baidu (Append Mode)

2.a rate 35000

||batch id||input rows||input rows per second||processed rows per second||
| 1 | 4335 | 752.081887578 | 111.233706251 |

(cancelled due to long running batch... and it even can't catch up input rate 1000, as we
already know)

3. HWX (Update Mode)

3.a rate 35000

||batch id||input rows||input rows per second||processed rows per second||
| 20 | 490000 | 37519.1424196 | 30161.2704666 |
| 21 | 560000 | 34470.0233904 | 31702.8985507 |
| 22 | 595000 | 33684.3297101 | 32063.3723123 |
| 23 | 665000 | 35833.6027589 | 31925.1080173 |
| 24 | 735000 | 35285.6457033 | 31268.6122692 |
| 25 | 805000 | 34245.1184753 | 30925.8547829 |


was (Author: kabhwan):
Just ran another performance test to check my new trial of improving state.

Here I try to overwrite values for given key instead of removing all values and append new
values for given key.
https://github.com/HeartSaVioR/spark/commit/6d466b9f424ae6a2b5a927e650f60ef35cfe30ca

The result was no luck (small performance hit compared to current), hence I would not put
the numbers for that here. But I've run the test from AWS c5d.xlarge with dedicated option,
hence more isolated and stable env. compared to before, which shows higher input rate.

Test Env.: c5d.xlarge, dedicated

A .plenty of sessions

1. HWX (Append Mode) 

1.a. input rate 20000

||batch id||input rows||input rows per second||processed rows per second||
| 21 | 113355 | 20234.7375937 | 19278.0612245 |
| 22 | 118905 | 20218.5002551 | 17958.7675578 |
| 23 | 120000 | 18121.4134703 | 15622.9657597 |
| 24 | 160000 | 20827.9093986 | 14406.6270484 |
| 25 | 220000 | 19807.3287116 | 12593.0165999 |

2. Baidu (Append Mode)

2.a. input rate 15000

||batch id||input rows||input rows per second||processed rows per second||
| 18 | 1005000 | 15068.3699172 | 5993.05878565 |
| 19 | 2505000 | 14937.8335669 | 4823.00254531 |

(cancelled since following batch takes too long... it even can't reach 10000)

3. HWX (Update Mode)

3.a. input rate 15000

||batch id||input rows||input rows per second||processed rows per second||
| 25 | 165000 | 15136.2260343 | 15351.6933383 |
| 26 | 165000 | 15350.2651409 | 28128.196386 |
| 27 | 90000 | 15342.6525742 | 16669.7536581 |
| 28 | 75000 | 13888.8888889 | 13557.483731 |
| 29 | 90000 | 16266.0401229 | 15131.1365165 |
| 30 | 90000 | 15128.5930408 | 13829.1333743 |

3.b. input rate 20000

||batch id||input rows||input rows per second||processed rows per second||
| 23 | 318210 | 19853.3815822 | 20039.6750425 |
| 24 | 320000 | 20151.1335013 | 23456.9711186 |
| 25 | 280000 | 20523.3453053 | 15197.5683891 |

(cancelled since following batch takes too long...)

B. plenty of rows in session

1. HWX (Append Mode)

1.a. input rate 30000

||batch id||input rows||input rows per second||processed rows per second||
| 21 | 295730 | 30210.4402901 | 25682.1537125 |
| 22 | 360000 | 31260.8544634 | 25906.7357513 |
| 23 | 420000 | 30222.3501475 | 28753.337441 |
| 24 | 420000 | 28751.3691128 | 29702.970297 |
| 25 | 420000 | 29700.8698112 | 28561.7137028 |

1.b. input rate 35000

||batch id||input rows||input rows per second||processed rows per second||
| 19 | 441716 | 36073.1727236 | 29971.2308319 |
| 20 | 490000 | 33245.1319628 | 28194.9479257 |
| 21 | 630000 | 36250.647333 | 30189.7642323 |
| 22 | 735000 | 35219.703867 | 28420.0757869 |
| 23 | 910000 | 35185.3999923 | 30372.8179967 |

2. Baidu (Append Mode)

2.a rate 35000

||batch id||input rows||input rows per second||processed rows per second||
| 1 | 4335 | 752.081887578 | 111.233706251 |

(cancelled due to long running batch... and it even can't catch up input rate 1000, as we
already know)


> EventTime based sessionization
> ------------------------------
>
>                 Key: SPARK-10816
>                 URL: https://issues.apache.org/jira/browse/SPARK-10816
>             Project: Spark
>          Issue Type: New Feature
>          Components: Structured Streaming
>            Reporter: Reynold Xin
>            Priority: Major
>         Attachments: SPARK-10816 Support session window natively.pdf, Session Window
Support For Structure Streaming.pdf
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message