spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cyril de Vogelaere (JIRA)" <>
Subject [jira] [Commented] (SPARK-20180) Add a special value for unlimited max pattern length in Prefix span, and set it as default.
Date Mon, 03 Apr 2017 09:42:42 GMT


Cyril de Vogelaere commented on SPARK-20180:

=> Why not let the default be Int.MaxValue?
I'm also ok with a default Int.MaxValue, if the special value zero is really something you
are against.

if that's what this is about, update the title to reflect it.
=> I will gladly do that, if you think the current title is misleading.

This is a behavior change by default, so we should think carefully about it
=> Yes, I agree.

What are the downsides – why would someone have ever made it 10? presumably, performance.
=> The changed code consist simply in an additionnal condition in an if. If you want to
see a graph, I have one that test the differences in performances, but on my implementation
optimised for single-item pattern. So it wouldn't be relevant, if you are worried of performance
drop, I can do additional tests, on the two lines I changed. If you want me to use some particular
dataset, I will also gladly oblige. Just say the word, and you will have them by tommorow.

So it would be less about what impact it has on the performance, since it would be negligeable
(again, i'm ready to prove that if you want me to), but about whether that feature seems needed
or not. Which I agree, is debatable.

Also, whichever senior implemented it that way, left this comment : 
// TODO: support unbounded pattern length when maxPatternLength = 0
Which you can find in the original code, and is the reason I created this Jira's thread first.
Among the list of improvement I want to propose.
You can find that line in the PrefixSpan code if you don't believe me.If theses change are
rejected, then when I have the occasion, I will remove that line. So it would establish it
isn't needed.

You mention tests don't end and haven't established it's not due to your change. 
=> I'm establishing that right now ... as I said.

I don't think we can proceed with this in this state, right?
=> I will leave the decision to you

> Add a special value for unlimited max pattern length in Prefix span, and set it as default.
> -------------------------------------------------------------------------------------------
>                 Key: SPARK-20180
>                 URL:
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>    Affects Versions: 2.1.0
>            Reporter: Cyril de Vogelaere
>            Priority: Minor
>   Original Estimate: 0h
>  Remaining Estimate: 0h
> Right now, we need to use .setMaxPatternLength() method to
> specify is the maximum pattern length of a sequence. Any pattern longer than that won't
be outputted.
> The current default maxPatternlength value being 10.
> This should be changed so that with input 0, all pattern of any length would be outputted.
Additionally, the default value should be changed to 0, so that a new user could find all
patterns in his dataset without looking at this parameter.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message