flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chiwan Park <chiwanp...@apache.org>
Subject Re: LabeledVector with label vector
Date Wed, 06 Jan 2016 01:12:09 GMT
Hi Theodore,

Thanks for explaining the reason. :)

So how about change LabeledVector contains two vectors? One of vectors is for label and the
other one is for value. I think this approach would be okay because a double value label could
be represented as a DenseVector(Array(LABEL_VALUE)).

Only problem in this approach is some overhead of processing Vector type in case of single
double label. If the overhead is significant, we should create two types of LabeledVector
such as DoubleLabeledVector and VectorLabeledVector.

Which one is preferred? 

> On Jan 5, 2016, at 11:38 PM, Theodore Vasiloudis <theodoros.vasiloudis@gmail.com>
> Generalizing the type of the label for the label vector is an idea we
> played with when designing the current optimization framework.
> We ended up deciding against it as the double type allows us to do
> regressions and (multiclass) classification which should be the majority of
> the use cases out there, while keeping the code simple.
> Generalizing this to [T <: Serializable] is too broad I think. [T <:
> Vector] is I think more reasonable, I cannot think of many cases where the
> label in an optimization problems is something other than a vector/double.
> Any change would require a number of changes in the optimization of course,
> as optimizing for vector and double labels requires different handling of
> error calculation etc but it should be doable.
> Note however that since LabeledVector is such a core part of the library
> any changes would involve a number of adjustments downstream.
> Perhaps having different optimizers etc. for Vectors and double labels
> makes sense, but I haven't put much though into this.
> On Tue, Jan 5, 2016 at 12:17 PM, Chiwan Park <chiwanpark@apache.org> wrote:
>> Hi Hilmi,
>> Thanks for suggestion about type of labeled vector. Basically, I agree
>> that your suggestion is reasonable. But, I would like to generialize
>> `LabeledVector` like following example:
>> ```
>> case class LabeledVector[T <: Serializable](label: T, vector: Vector)
>> extends Serializable {
>>  // some implementations for LabeledVector
>> }
>> ```
>> How about this implementation? If there are any other opinions, please
>> send a email to mailing list.
>>> On Jan 5, 2016, at 7:36 PM, Hilmi Yildirim <Hilmi.Yildirim@dfki.de>
>> wrote:
>>> Hi,
>>> in the ML-Pipeline of Flink we have the "LabeledVector" class. It
>> consists of a vector and a label as a double value. Unfortunately, it is
>> not applicable for sequence learning where the label is also a vector. For
>> example, in NLP we have a vector of words and the label is a vector of the
>> corresponding labels.
>>> The optimize function of the "Solver" class has a DateSet[LabeledVector]
>> as input and, therefore, it is not applicable for sequence learning. I
>> think the LabeledVector should be adapted that the label is a vector
>> instead of a single Double value. What do you think?
>>> Best Regards,
>>> --
>>> ==================================================================
>>> Hilmi Yildirim, M.Sc.
>>> Researcher
>>> DFKI GmbH
>>> Intelligente Analytik für Massendaten
>>> DFKI Projektbüro Berlin
>>> Alt-Moabit 91c
>>> D-10559 Berlin
>>> Phone: +49 30 23895 1814
>>> E-Mail: Hilmi.Yildirim@dfki.de
>>> -------------------------------------------------------------
>>> Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
>>> Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern
>>> Geschaeftsfuehrung:
>>> Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
>>> Dr. Walter Olthoff
>>> Vorsitzender des Aufsichtsrats:
>>> Prof. Dr. h.c. Hans A. Aukes
>>> Amtsgericht Kaiserslautern, HRB 2313
>>> -------------------------------------------------------------
>> Regards,
>> Chiwan Park

Chiwan Park

View raw message