flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Theodore Vasiloudis <theodoros.vasilou...@gmail.com>
Subject Re: Flink ML Vector and DenseVector
Date Mon, 18 Jan 2016 13:56:09 GMT
I agree with Till, the data types are different here so you need a custom
string vector.

The Vector abstraction in FlinkML is designed with numerical vectors in
mind.

On Mon, Jan 18, 2016 at 2:33 PM, Till Rohrmann <trohrmann@apache.org> wrote:

> Hi Hilmi,
>
> I think in your case it makes sense to define a custom vector of strings.
> The easiest implementation could be an Array[String] or List[String].
>
> The reason why it does not make so much sense to make Vector and
> DenseVector
> generic is that these types are algebraic data types. How would you define
> algebraic operations such as scalar product, outer product, multiplication,
> etc. on a vector of strings? Then you would have to provide different
> implementations for the different type parameters.
>
> Cheers,
> Till
> ​
>
> On Mon, Jan 18, 2016 at 1:40 PM, Hilmi Yildirim <Hilmi.Yildirim@dfki.de>
> wrote:
>
> > Hi,
> > how I explained it in a previous E-Mail, I need a LabeledVector where the
> > label is also a vector. After we discussed this issue, I created a new
> > class named LabeledSequenceVector with the labels as a Vector. In my use
> > case, I want to train a POS-Tagger system, so the "vector" is a vector of
> > strings and the "labels" is also a vector of strings. If I use the Flink
> > Vector/DenseVector implementation then the vector does only have double
> > values but I need String values.
> >
> > Best Regards,
> > Hilmi
> >
> >
> > Am 18.01.2016 um 13:33 schrieb Chiwan Park:
> >
> >> Hi Hilmi,
> >>
> >> In NLP, which types are used for vector values? I think we can cover
> >> typical case using double values.
> >>
> >> On Jan 18, 2016, at 9:19 PM, Hilmi Yildirim <Hilmi.Yildirim@dfki.de>
> >>> wrote:
> >>>
> >>> Hi,
> >>> the Vector and DenseVector implementations of Flink ML only allow
> Double
> >>> values. But there are cases where the values are not Doubles, e.g. in
> NLP.
> >>> Does it make sense to make the implementations generic, i.e. Vector[T]
> and
> >>> DenseVector[T]?
> >>>
> >>> Best Regards,
> >>> Hilmi
> >>>
> >>> --
> >>> ==================================================================
> >>> Hilmi Yildirim, M.Sc.
> >>> Researcher
> >>>
> >>> DFKI GmbH
> >>> Intelligente Analytik für Massendaten
> >>> DFKI Projektbüro Berlin
> >>> Alt-Moabit 91c
> >>> D-10559 Berlin
> >>> Phone: +49 30 23895 1814
> >>>
> >>> E-Mail: Hilmi.Yildirim@dfki.de
> >>>
> >>> -------------------------------------------------------------
> >>> Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
> >>> Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern
> >>>
> >>> Geschaeftsfuehrung:
> >>> Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
> >>> Dr. Walter Olthoff
> >>>
> >>> Vorsitzender des Aufsichtsrats:
> >>> Prof. Dr. h.c. Hans A. Aukes
> >>>
> >>> Amtsgericht Kaiserslautern, HRB 2313
> >>> -------------------------------------------------------------
> >>>
> >>> Regards,
> >> Chiwan Park
> >>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message