flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex DeCastro (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (FLINK-5936) Can't pass keyed vectors to KNN join algorithm
Date Mon, 06 Mar 2017 09:24:32 GMT

     [ https://issues.apache.org/jira/browse/FLINK-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Alex DeCastro updated FLINK-5936:
---------------------------------

Hi Till, by keyed vector I mean a vector that has some sort of unique identifier so that when
a ML algo spits out predictions I can refer back to the original (read untransformed) data
row.

So for example, let’s say I’m clustering JIRA tickets: then I’d like to include the
identifier FLINK-XXXX to my Breeze vectors so that they can be inspected after clustering:
I’d like to see that, from a domain expert perspective, the clusters I get do refer to similar
topics.

In my case, I removed the project tag (ie FLINK-) from the unique identifier and augmented
by numerical vectors to have one extra slot for a key. Then I modify my distance metric to
ignore that extra coordinate.

But it would be useful to have a variable in the Vector class that can be initialized to a
unique identifier.

Can you elaborate on the PredictDataSetOperation? I’m still new to Flink. Thanks, Alex

On 3/3/17, 2:58 PM, "Till Rohrmann (JIRA)" <jira@apache.org> wrote:


        [ https://issues.apache.org/jira/browse/FLINK-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15894510#comment-15894510
]

    Till Rohrmann commented on FLINK-5936:
    --------------------------------------

    Hi Alex,

    what do you mean by keyed vectors?

    Did you mean labeled vectors? That is indeed not supported yet. But you could add a respective
{{PredictDataSetOperation}} for {{KNN}}.

    > Can't pass keyed vectors to KNN join algorithm
    > ------------------------------------------------
    >
    >                 Key: FLINK-5936
    >                 URL: https://issues.apache.org/jira/browse/FLINK-5936
    >             Project: Flink
    >          Issue Type: Improvement
    >          Components: Machine Learning Library
    >    Affects Versions: 1.1.3
    >            Reporter: Alex DeCastro
    >            Priority: Minor
    >
    > Hi there,
    > I noticed that for Scala 2.10/Flink 1.1.3 there's no way to recover keys from the
predict method of KNN join even if the Vector (FlinkVector) class gets extended to allow for
keys.
    > If I create a class say, SparseVectorsWithKeys the predict method will return SparseVectors
only. Any workarounds here?
    > Would it be possible to either extend the Vector class or the ML models to consume
and output keyed vectors?  This is very important to NLP and pretty much a lot of ML pipeline
debugging -- including logging.
    > Thanks a lot
    > Alex



    --
    This message was sent by Atlassian JIRA
    (v6.3.15#6346)


This email and any attachments may contain information which is confidential and/or privileged.
The information is intended exclusively for the addressee and the views expressed may not
be official policy, but the personal views of the originator. If you are not the intended
recipient, be aware that any disclosure, copying, distribution or use of the contents is prohibited.
If you have received this email and any file transmitted with it in error, please notify the
sender by telephone or return email immediately and delete the material from your computer.
Internet communications are not secure and Lab49 is not responsible for their abuse by third
parties, nor for any alteration or corruption in transmission, nor for any damage or loss
caused by any virus or other defect. Lab49 accepts no liability or responsibility arising
out of or in any way connected to this email.


> Can't pass keyed vectors to KNN join algorithm  
> ------------------------------------------------
>
>                 Key: FLINK-5936
>                 URL: https://issues.apache.org/jira/browse/FLINK-5936
>             Project: Flink
>          Issue Type: Improvement
>          Components: Machine Learning Library
>    Affects Versions: 1.1.3
>            Reporter: Alex DeCastro
>            Priority: Minor
>
> Hi there, 
> I noticed that for Scala 2.10/Flink 1.1.3 there's no way to recover keys from the predict
method of KNN join even if the Vector (FlinkVector) class gets extended to allow for keys.
 
> If I create a class say, SparseVectorsWithKeys the predict method will return SparseVectors
only. Any workarounds here?  
> Would it be possible to either extend the Vector class or the ML models to consume and
output keyed vectors?  This is very important to NLP and pretty much a lot of ML pipeline
debugging -- including logging. 
> Thanks a lot
> Alex



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message