spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Evan R. Sparks" <evan.spa...@gmail.com>
Subject Re: Could the function MLUtils.loadLibSVMFile be modified to support zero-based-index data?
Date Tue, 08 Jul 2014 15:25:28 GMT
As Sean mentions, if you can change the data to the standard format, that's
probably a good idea. If you'd rather read the data raw, then writing your
own version of loadLibSVMFile - then you could make your own loader
function which is very similar to the existing one with a few characters
removed:

https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala#L81

You will also likely need to change the logic where it determines the
number of features (currently line 95)


On Tue, Jul 8, 2014 at 12:22 AM, Sean Owen <sowen@cloudera.com> wrote:

> On Tue, Jul 8, 2014 at 7:29 AM, Lizhengbing (bing, BIPA) <
> zhengbing.li@huawei.com> wrote:
>
> >
> > 1)  I download the imdb data from
> > http://komarix.org/ac/ds/Blanc__Mel.txt.bz2 and use this data to test
> > LBFGS
> > 2)  I find the imdb data are zero-based-index data
> >
>
> Since the method is for parsing the LIBSVM format, and its labels are
> always 1-indexed IIUC, I don't think it would make sense to read 0-indexed
> labels. It sounds like that input is not properly formatted, unless anyone
> knows to the contrary?
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message