hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Enis Soztutar <enis.soz.nu...@gmail.com>
Subject Re: Apriori and association rules with Hadoop
Date Tue, 19 Jan 2010 00:20:32 GMT
We have succesfully implemented  sequential apriori ( a variant of GSP ) on
MapReduce. Candidate generation, counting and candidate selection steps are
all parallel. Each iteration runs 2 chained jobs.

A rough algorithm for this is like :
  L1 = generateL1() // which is a word-count like program

  repeat {
     countCandidates()
     generateCandidates()
  }

countCandidates() {
    candidate set is already in HDFS
    every task JVM reads the candidate set into memory ( hash-tree can be
used here)
    input : data set which is stored in sequence files
    in mapper, for each candidate that is contained in the input, emit <C,
1>
    in reducer sum up the values for Candidate C
    emit candidates with support greater than threshold
}

generateCandidates() {
   mapper:
      for pattern P = [p1,p2,...pn]
         emit < p1, [p2,...pn] >  //with a special flag for first
         emit < [p1, p2,...pn-1], pn > //with a special flag for last

    reducer :  //keys are sub-patterns [p1,....pn-1] of length n-1, values
are individual items
         for each first value v_f
            for each last value v_l
               emit  [ v_f, p1, .. pn-1, v_l ]
}

Hope this helps,

Enis

On Wed, Jan 13, 2010 at 11:30 PM, Raymond Jennings III <
raymondjiii@yahoo.com> wrote:

> Is it doable to create an Apriori app using map-reduce?  I am starting out
> but it's not clear how to create the next Candidate sets based on a previous
> run.  Does anyone have any experience with this?
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message