spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Feynman Liang <fli...@databricks.com>
Subject Re: MLlib Prefixspan implementation
Date Tue, 25 Aug 2015 04:15:02 GMT
CCing the mailing list again.

It's currently not on the radar. Do you have a use case for it? I can bring
it up during 1.6 roadmap planning tomorrow.

On Mon, Aug 24, 2015 at 8:28 PM, alexis GILLAIN <ilaxes@hotmail.com> wrote:

> Hi,
>
> I just realized the article I mentioned is cited in the jira and not in
> the code so I guess you didn't use this result.
>
> Do you plan to implement sequence with timestamp and gap constraint as in :
>
> https://people.mpi-inf.mpg.de/~rgemulla/publications/miliaraki13mg-fsm.pdf
>
> 2015-08-25 7:06 GMT+08:00 Feynman Liang <fliang@databricks.com>:
>
>> Hi Alexis,
>>
>> Unfortunately, both of the papers you referenced appear to be
>> translations and are quite difficult to understand. We followed
>> http://doi.org/10.1109/ICDE.2001.914830 when implementing PrefixSpan.
>> Perhaps you can find the relevant lines in there so I can elaborate further?
>>
>> Feynman
>>
>> On Thu, Aug 20, 2015 at 9:07 AM, alexis GILLAIN <ilaxes@hotmail.com>
>> wrote:
>>
>>> I want to use prefixspan so I had a look at the code and the cited paper
>>> : "Distributed PrefixSpan Algorithm Based on MapReduce".
>>>
>>> There is a result in the paper I didn't really undertstand and I
>>> could'nt find where it is used in the code.
>>>
>>> Suppose a sequence database S = {­1­,2...­n}, a sequence <a...> is a
>>> length-(L-1) (2≤L≤n) sequential pattern, in projected databases which is
a
>>> prefix of a length-(L-1) sequential pattern <a...a>, when the support count
>>> of <a> is not less than min_support, it is equal to obtaining a length-L
>>> sequential pattern < a ... a > from projected databases that obtaining
a
>>> length-L sequential pattern < a ... a > from a sequence database S.
>>>
>>> According to the paper It's supposed to add a pruning step in the reduce
>>> function but I couldn't find where.
>>>
>>> This result seems to come from a previous paper : "Wang Linlin, Fan Jun.
>>> Improved Algorithm for Sequential Pattern Mining Based on PrefixSpan [J].
>>> Computer Engineering, 2009, 35(23): 56-61" but it didn't help me to
>>> understand it and how it can improve the algorithm.
>>>
>>
>>
>

Mime
View raw message