Mailing-List: contact dev-help@spark.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@spark.apache.org
Received-SPF: pass (athena.apache.org: domain of david.lw.hall@gmail.com
 designates 74.125.82.178 as permitted sender)
MIME-Version: 1.0
Sender: david.lw.hall@gmail.com
In-Reply-To: 
 <CAEYYnxa8=TP4YYM=H=H+e23qnW-edn7Xae+8Nb+=erFg6esBiQ@mail.gmail.com>
References: 
 <CAEYYnxbj9uQW1iMvQZ9giHzbEXrdqNt6ea66UXhsGfivxxcFbw@mail.gmail.com>
	<FCF38381-637E-42F7-8732-17E1FC0BF51C@gmail.com>
	<CAEYYnxYT-yoziE1JKxC3DFJsZ2Pa=xw2LzetSiWFBpLC+u-=PQ@mail.gmail.com>
	<CAKx7Bf_9w0909zJN=KomXYL-4a_wT_OcA_GmzmHqZTDa69Nh4g@mail.gmail.com>
	<CAEYYnxa8=TP4YYM=H=H+e23qnW-edn7Xae+8Nb+=erFg6esBiQ@mail.gmail.com>
Date: Wed, 23 Apr 2014 22:16:11 -0700
Message-ID: 
 <CALW2ey2iu44y5UjuumFoH0rqCqh90+zF=pZK4h-1XqV-=k2wDA@mail.gmail.com>
Subject: Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense
 benchmark result
From: David Hall <dlwh@cs.berkeley.edu>
To: dev@spark.apache.org, dbtsai@dbtsai.com
Cc: shivaram@eecs.berkeley.edu, Xiangrui Meng <mengxr@gmail.com>
Content-Type: multipart/alternative; boundary=f46d044287c08cc6fd04f7c2f25f

--f46d044287c08cc6fd04f7c2f25f
Content-Type: text/plain; charset=UTF-8

Was the weight vector sparse? The gradients? Or just the feature vectors?


On Wed, Apr 23, 2014 at 10:08 PM, DB Tsai <dbtsai@dbtsai.com> wrote:

> The figure showing the Log-Likelihood vs Time can be found here.
>
>
> https://github.com/dbtsai/spark-lbfgs-benchmark/raw/fd703303fb1c16ef5714901739154728550becf4/result/a9a11M.pdf
>
> Let me know if you can not open it.
>
> Sincerely,
>
> DB Tsai
> -------------------------------------------------------
> My Blog: https://www.dbtsai.com
> LinkedIn: https://www.linkedin.com/in/dbtsai
>
>
> On Wed, Apr 23, 2014 at 9:34 PM, Shivaram Venkataraman <
> shivaram@eecs.berkeley.edu> wrote:
>
> > I don't think the attachment came through in the list. Could you upload
> > the results somewhere and link to them ?
> >
> >
> > On Wed, Apr 23, 2014 at 9:32 PM, DB Tsai <dbtsai@dbtsai.com> wrote:
> >
> >> 123 features per rows, and in average, 89% are zeros.
> >> On Apr 23, 2014 9:31 PM, "Evan Sparks" <evan.sparks@gmail.com> wrote:
> >>
> >> > What is the number of non zeroes per row (and number of features) in
> the
> >> > sparse case? We've hit some issues with breeze sparse support in the
> >> past
> >> > but for sufficiently sparse data it's still pretty good.
> >> >
> >> > > On Apr 23, 2014, at 9:21 PM, DB Tsai <dbtsai@stanford.edu> wrote:
> >> > >
> >> > > Hi all,
> >> > >
> >> > > I'm benchmarking Logistic Regression in MLlib using the newly added
> >> > optimizer LBFGS and GD. I'm using the same dataset and the same
> >> methodology
> >> > in this paper, http://www.csie.ntu.edu.tw/~cjlin/papers/l1.pdf
> >> > >
> >> > > I want to know how Spark scale while adding workers, and how
> >> optimizers
> >> > and input format (sparse or dense) impact performance.
> >> > >
> >> > > The benchmark code can be found here,
> >> > https://github.com/dbtsai/spark-lbfgs-benchmark
> >> > >
> >> > > The first dataset I benchmarked is a9a which only has 2.2MB. I
> >> > duplicated the dataset, and made it 762MB to have 11M rows. This
> dataset
> >> > has 123 features and 11% of the data are non-zero elements.
> >> > >
> >> > > In this benchmark, all the dataset is cached in memory.
> >> > >
> >> > > As we expect, LBFGS converges faster than GD, and at some point, no
> >> > matter how we push GD, it will converge slower and slower.
> >> > >
> >> > > However, it's surprising that sparse format runs slower than dense
> >> > format. I did see that sparse format takes significantly smaller
> amount
> >> of
> >> > memory in caching RDD, but sparse is 40% slower than dense. I think
> >> sparse
> >> > should be fast since when we compute x wT, since x is sparse, we can
> do
> >> it
> >> > faster. I wonder if there is anything I'm doing wrong.
> >> > >
> >> > > The attachment is the benchmark result.
> >> > >
> >> > > Thanks.
> >> > >
> >> > > Sincerely,
> >> > >
> >> > > DB Tsai
> >> > > -------------------------------------------------------
> >> > > My Blog: https://www.dbtsai.com
> >> > > LinkedIn: https://www.linkedin.com/in/dbtsai
> >> >
> >>
> >
> >
>

--f46d044287c08cc6fd04f7c2f25f--