hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pradeep Kanchgar <Pradeep.Kanch...@lntinfotech.com>
Subject Bloom fiter in reduce side join
Date Fri, 07 Mar 2014 12:30:13 GMT
Hi,

I'm currently exploring bloom filter. I've gone through most of the blogs on bloom filters
and know what it is but still not able to figure out an example on this in case joins.
I've just started out with map reduce programming.

Can anyone help me in implementing bloom filter in the below example(reduce side join)

I'm joining two datasets "Employee(users)" and "Departments" with reduce side join.

2 mappers to read "user(Employee)" records and "Department" records and reducer to join

user(Employee)" records                                                             Department
records
id, name                                                                                 
            id, dept name

3738, Richie Gore                                                                        
    3738,Sales
12946,Rony Sam                                                                           
   12946,Marketing
17556,David Gart                                                                         
    3738,Sales
3443,Rachel Smith                                                                        
  3443,Sales
5799,Paul Rosta


My code
Mapper-1 to read user(employee) records

public class UserMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text,
Text>{

       private Text outkey = new Text();
       private Text outval = new Text();
       private String id, name;

   public void map (LongWritable key, Text value, OutputCollector<Text, Text> ouput,Reporter
reporter)
                       throws IOException {

            String line = value.toString();
            String arryUsers[] = line.split(",");
            id = arryUsers[0].trim();
            name = arryUsers[1].trim();

            outkey.set(id);
            outval.set("A"+ name);
            ouput.collect(outkey, outval);

   }

}

Mapper -2 to read departments records

public class DepartMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text,
Text> {

       private Text Outk = new Text();
       private Text Outv = new Text();
       String depid, dep ;

       public void map (LongWritable key, Text value, OutputCollector<Text, Text> output,
Reporter reporter) throws IOException {

              String line = value.toString();
              String arryDept[] = line.split(",");
              depid = arryDept[0].trim();
              dep = arryDept[1].trim();

              Outk.set(depid);
              Outv.set("B" + dep);

           output.collect(Outk, Outv);
       }


}

Reducer to join

public class JoinReducer extends MapReduceBase implements Reducer<Text, Text, Text, Text>
{

       private Text tmp = new Text();
       private ArrayList<Text> listA = new ArrayList<Text>();
       private ArrayList<Text> listB = new ArrayList<Text>();

       public void reduce(Text key, Iterator<Text> values, OutputCollector<Text,
Text>output, Reporter reporter) throws IOException {

              listA.clear();
              listB.clear();

              while (values.hasNext()) {

                     tmp = values.next();
                     if (tmp.charAt(0) == 'A') {
                           listA.add(new Text(tmp.toString().substring(1)));
                     } else if (tmp.charAt(0) == 'B') {
                           listB.add(new Text(tmp.toString().substring(1)));
                     }



              }
              executejoinlogic(output);

       }

       private void executejoinlogic(OutputCollector<Text, Text> output) throws IOException
{

              if (!listA.isEmpty() && !listB.isEmpty()) {
                     for (Text A : listA) {
                     for (Text B : listB) {
                     output.collect(A, B);
                     }
                     }
              }

       }

}



Using eclipse IDE for development and connecting to apache Hadoop 1.1.1 release

Thanks & Regards,
Pradeep C Kanchgar

[cid:image001.jpg@01CF3A24.F73E1B70]


________________________________
The contents of this e-mail and any attachment(s) may contain confidential or privileged information
for the intended recipient(s). Unintended recipients are prohibited from taking action on
the basis of information in this e-mail and using or disseminating the information, and must
notify the sender and delete it from their system. L&T Infotech will not accept responsibility
or liability for the accuracy or completeness of, or the presence of any virus or disabling
code in this e-mail"

Mime
View raw message