mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Itemsimilairty
Date Thu, 29 May 2014 22:40:29 GMT
Can we separate file I/O JIRA's?  That will let the core library components
to be unwedged separately from getting I/O standardized.




On Thu, May 29, 2014 at 3:35 PM, Pat Ferrel <pat.ferrel@gmail.com> wrote:

> Agreed and in process. Sebastian’s Cooccurrence code optionally takes two
> drms.
>
> The current CLI for itemsimilarity filters one stream for input,
> optionally creating two DRMs and so does support cross-similarity. The CLI
> will soon allow  two input streams. The CLI for RSJ will (if I do it) take
> one or two DRMs.
>
> Please feel free to comment on the Jiras MAHOUT-1464 (cooccurrence) and
> MAHOUT-1541 (itemsimilarity CLI)
>
> They are maybe 80% ready, which is why a dialog over file reader/writers,
> drivers, and CLI might be good. If we can move on those there are a bunch
> of other jobs that can be packaged up pretty quickly from Dmitriy’s SSVD
> PCA, Transpose, multiply, etc.
>
> On May 29, 2014, at 2:32 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
>
> Pat
>
> I would like to see the co and cross occurrence code separated out a bit
> so that they take drm args.
>
> Sent from my iPhone
>
> > On May 29, 2014, at 17:58, Pat Ferrel <pat.ferrel@gmail.com> wrote:
> >
> > Regarding recommenders, drivers, and import/export:
> >
> > I’ve got Sebastian’s cooccurrence code wrapped with a driver that reads
> text delimited files into a drm for use with cooccurrence. Then it writes
> the indicator matrix(es) as text delimited files with user specified IDs.
> It also has a proposed Driver base class, Scala based option parser and
> ReadStore/WriteStore traits. The CLI will be mostly a superset of the
> itemsimilarity in legacy mr. The read/write stuff is meant to be pretty
> generic so I was planning to do a DB and maybe JSON example (some day).
> There is still a bit of functional programming refactoring and the docs are
> not up to date.
> >
> > With cooccurrence working we could do something that replaces all the
> cooccurrence  recommenders (in-memory and MR) with one codebase. Add Solr
> and you have a single machine server based recommender that we can supply
> with an API similar to the legacy in-memory recommender. The cool thing is
> that It will scale out to a cluster with Solr and HDFS, requiring only
> config changes. The downside is that it requires at least a standalone
> local version of Spark to do the cooccurrence. BTW this would give us
> something people have been asking for—a recommender service.
> >
> > Is anyone else interested in CLI, drivers, read/write in the
> import/export sense? Or a new architecture for the recommenders? If so,
> maybe a separate thread?
> >
> > On May 29, 2014, at 7:03 AM, Ted Dunning <ted.dunning@gmail.com> wrote:
> >
> > Andrew,
> >
> > Sebastian and I were talking yesterday and guessing that you would be
> > interested in this soon.  Glad to know the world is as expected.
> >
> > Yes. This needs to happen at least at a very conceptual level.  For
> > instance, for classifiers, I think that we need to have something like:
> >
> >  - progressively train against a batch of data
> >       questions: should this do multiple epochs?  Throw an exception if
> > on-line training not supported?  throw an exception if too little data
> > provided?
> >
> >  - classify a batch of data
> >
> >  - serialize a model
> >
> >  - de-serialize a model
> >
> > Note that a batch listed above should be either a bunch of observations
> or
> > just one.
> >
> > Question: does this handle the following cases:
> >
> > - naive bayes
> > - SGD trained on continuous data
> > - batch trained <mumble> classifiers
> > - downpour type classifier training
> >
> > ?
> >
> >
> >
> >> On Wed, May 28, 2014 at 6:25 PM, Andrew Palumbo <ap.dev@outlook.com>
> wrote:
> >>
> >> This may be somewhat tangential to this thread, but would now be a good
> >> time to start laying out some scala traits for
> >> Classifiers/Clusterers/Recommenders?  I am totally scala-naive, but have
> >> been trying to keep up with the discussions.
> >>
> >> I don't know if this is premature but it seems that now that the DSL
> data
> >> structures have been at least sketched out if not fully implemented,  it
> >> would be useful to have these in place before people start porting too
> much
> >> over.  It might be helpful in bringing in new contributions as well.
> >>
> >> It could also help regarding people's questions of integrating a future
> >> wrapper layer.
> >>
> >>
> >>
> >>> From: ted.dunning@gmail.com
> >>> Date: Wed, 28 May 2014 17:10:43 -0700
> >>> Subject: Re: do we really need scala still
> >>> To: dev@mahout.apache.org
> >>>
> >>> +1
> >>>
> >>> Let's use a successful scala model as a suggestion about where to go.
>  It
> >>> seems plausible that Java could emulate the building of a lazy DSL
> >> logical
> >>> plan and then poke it in plausible ways with the addition of a wrapper
> >>> layer.  But that only helps if the Scala layer succeeds.
> >>>
> >>>
> >>>
> >>> On Tue, May 27, 2014 at 10:56 AM, Dmitriy Lyubimov <dlieu.7@gmail.com
> >>> wrote:
> >>>
> >>>> Also, i think that this is leaning towards false dilemma fallacy.
> >> Scala and
> >>>> java models could happily exist at the same time and hopefully,
> minimal
> >>>> fragmentation of the project if done with precision and care.
> >>>>
> >>>>
> >>>> On Tue, May 27, 2014 at 10:46 AM, Dmitriy Lyubimov <dlieu.7@gmail.com
> >>>>> wrote:
> >>>>
> >>>>>
> >>>>> not sure there's much sense in taking user survey if we can't act
on
> >>>> this.
> >>>>> In our situation, unfortunately, we don't have that many ideas to
> >> choose
> >>>>> from, so there's not much wiggle room imo. It is more like
> >> reinforcement
> >>>>> learning -- stuff that doesn't get used or supported, just dies
> >> .that's
> >>>> it.
> >>>>> Scala bindings, though thumb up'd internally, are yet to earn this
> >> status
> >>>>> externally. In that sense we always have been watching for
> >> use/support,
> >>>>> that's why we culled out tons of stuff. Nothing changes going
> >> forward (at
> >>>>> least at this point). If we have tons of new ideas/contributions,
> >> then it
> >>>>> may be different. What is weak, dies on its own pretty evidently
> >> without
> >>>>> much extra effort.
> >>>>>
> >>>>>
> >>>>>> On Tue, May 27, 2014 at 10:32 AM, Pat Ferrel <pat.ferrel@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> We are asking that anyone using Mahout as a lib or in the DSL-shell
> >> to
> >>>>>> learn Scala. While I still think it’s the right idea, user’s
may
> >>>> disagree.
> >>>>>> We should probably either solicit comments or at least keep
an eye
> >> on
> >>>>>> reactions to this. Spark took this route when the question was
even
> >>>> more in
> >>>>>> doubt and so is at least partially supporting multiple bindings.
> >>>>>>
> >>>>>> Not sure how far we want to carry this but we could supply Java
> >> bindings
> >>>>>> to the CLI-type things pretty easily.
> >>>>>>
> >>>>>>
> >>>>>> On May 26, 2014, at 2:43 PM, Dmitriy Lyubimov <dlieu.7@gmail.com>
> >>>> wrote:
> >>>>>>
> >>>>>> Well, first, functional programming in java8 is about 2-3 years
> >> late to
> >>>>>> the
> >>>>>> scene. So the reasoning along the lines, hey, we already are
using
> >> tool
> >>>> A,
> >>>>>> and now tool B is available which is almost as good as A, so
let's
> >>>> migrate
> >>>>>> to B, is fallible. Tool B must demonstrate not just matching
> >>>> capabilities,
> >>>>>> but far superb, to justify cost of such migration.
> >>>>>>
> >>>>>> Second, as other pointed, java 8 doesn't really match scala,
not yet
> >>>>>> anyway. One important feature of scala bindings work is proper
> >> operator
> >>>>>> overload (R-like DSL). That would not be possible to do in java
8,
> >> as it
> >>>>>> stands. Yes, as other pointed, it makes things concise, but
most
> >>>>>> importantly, it also makes things operation-centric and eliminates
> >>>> nested
> >>>>>> calls pile-up.
> >>>>>>
> >>>>>> Third, as it stands today, it would also presentn a problem
from the
> >>>> Spark
> >>>>>> integration point of view. Spark does have java bindings, but
first,
> >>>> they
> >>>>>> are underdefined (you can check spark list for tons of postings
> >> about
> >>>>>> missing equivalent capability), and they are certainly not
> >>>> java-8-vetted.
> >>>>>> So java api in Spark for java 8 purposes, as it stands, is a
moot
> >> point.
> >>>>>>
> >>>>>> There are also a number other goodies and clashes that exist
-- use
> >> of
> >>>>>> scala collections vs. Java collections, clean functional type
> >> syntax,
> >>>>>> magic
> >>>>>> methods, partially defined functions, case class matchers,
> >> implicits,
> >>>> view
> >>>>>> and context bounds etc. Etc., all that sh$tload of acrobatics
that
> >> comes
> >>>>>> actually very handy in existing  implemetations and has no
> >> substitute in
> >>>>>> Java 8.
> >>>>>> On May 25, 2014 12:48 PM, "bandi shankar" <bandi.mahout@gmail.com>
> >>>> wrote:
> >>>>>>
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> I was just thinking , do we still need scala . Since in
java 8 we
> >> have
> >>>>>>> all(probably) kind of feature provided by scala.
> >>>>>>> Since I am new to group , so just thinking why not to make
mahout
> >> away
> >>>>>>> from scala. Is there any specific reason to adopt scala.
> >>>>>>>
> >>>>>>> Bandi
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message