mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From hdev ml <hde...@gmail.com>
Subject Re: Question about data warehousing and mining through Mahout
Date Thu, 02 Sep 2010 17:45:35 GMT
Thanks Lance. Will take a look at KNime also.

On Wed, Sep 1, 2010 at 7:37 PM, Lance Norskog <goksron@gmail.com> wrote:

> The KNime program ("nime") from KNime.org is a great way to get your
> feet wet in data mining. It has some machine learning stuff as well.
> It lets you poke around your data and prototype ways to tease out
> facts. It has a bunch of machine learning tools and just plain
> data-shuffling tools. It's a visual graph programming language, so buy
> a very big monitor. And it wraps Weka and R.
>
> On Wed, Sep 1, 2010 at 10:48 AM, hdev ml <hdevml@gmail.com> wrote:
> > I agree with you that there is preparation needed for Mahout processing.
> >
> > I was just trying to save on that effort by re-using the data in hive
> > instead of double processing it.
> >
> > I may have some more questions when I actually dive into the mining part.
> > (possibly a couple of months down the line).
> >
> > Thanks for your inputs.
> >
> > On Wed, Sep 1, 2010 at 12:58 AM, Sean Owen <srowen@gmail.com> wrote:
> >
> >> Hive does something fairly unrelated to Mahout. It's an indexing and
> >> query system. Both might start from the same source data, but to do
> >> different things. There is no common format, no. Mahout generally
> >> operates on text files or "Vectors" in SequenceFiles. So there's some
> >> translation there at least.
> >>
> >> But I think a message here is that there's more preparation and
> >> thought necessary to start data mining. It's not like you point a data
> >> mining tool at some data and answers start flowing automatically.
> >> You'd have to be deliberately extracting and preparing data anyhow.
> >>
> >> On Tue, Aug 31, 2010 at 11:41 PM, hdev ml <hdevml@gmail.com> wrote:
> >> > Thanks Sean for the answers. Thanks for Ted for validation.
> >> >
> >> > Now my question is, since I want to do both reporting of large data/
> >> > datawarehouse, let's assume I choose Hive for that.
> >> >
> >> > Now can Mahout integrate with Hive to make use of this data for
> learning,
> >> > mining etc.? or do I have to export the hive data into text files
> which
> >> can
> >> > be hosted by Haddop/HDFS which later on Mahout can use for data
> mining.
> >> >
> >> > In short, can data warehousing part be done by Hive and then can data
> >> mining
> >> > part be done by Mahout on this hive data?
> >> >
> >> > -H
> >> >
> >> > On Tue, Aug 31, 2010 at 3:03 PM, Sean Owen <srowen@gmail.com> wrote:
> >> >
> >> >> On Tue, Aug 31, 2010 at 10:55 PM, hdev ml <hdevml@gmail.com>
wrote:
> >> >> > Per my understanding of hive, we can do some statistical reporting,
> >> like
> >> >> > frequency of user sessions, which geographical region, which device
> he
> >> is
> >> >> > using the most etc.
> >> >>
> >> >> Yes that's about what Hive is good for, if you're looking for some
> >> >> open-source libraries along those lines.
> >> >>
> >> >> >
> >> >> > But we also want to mine this data to get some predictive
> capabilities
> >> >> like
> >> >> > what is the likelihood that the user will use the same device
again
> or
> >> if
> >> >> we
> >> >> > get sales/marketing data (on the roadmap for future), we want
to
> >> possibly
> >> >> > predict which region to put more marketing/sales efforts. What
is
> the
> >> >> > pattern for growth of user base, in which geographical regions
etc.
> >> What
> >> >> is
> >> >> > the pattern of user requests failing and a number of requirements
> like
> >> >> these
> >> >> > from the business.
> >> >>
> >> >> This is pretty broad but I can try to give you the names of problems
> >> >> this sounds like, to guide your search.
> >> >>
> >> >> Predicting user usage of device sounds like a classification problem,
> >> >> like developing a probabilistic model of behavior.
> >> >>
> >> >> Deciding where to put marketing dollars sounds like a business
> >> >> problem, not machine learning. I don't think a computer can tell you
> >> >> that. Some techniques might help you identify trends in sales, but
> >> >> this is simple regression, not really machine learning.
> >> >>
> >> >> Looking for patterns in failure sounds a bit like frequent pattern
> >> >> mining -- trying to find events that go together unusually often.
> >> >>
> >> >
> >>
> >
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message