hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 谭军 <tanjun_2...@163.com>
Subject How can I achieve secondary retrieval in mapreduce?
Date Mon, 08 Aug 2011 14:58:46 GMT
I want to write a program to achieve secondary retrieval, but don't know how to do it.
I don't know how to express myself, so the source code below my help.
I don't know whether my first retieval algorithm is right, but it worked.
Database file is the inputfile.
I think it is splited into different mappers.
I thought that using a LinkedList to store the new keys generated by first retrieval could
But I don't know how to retrieve the database file from the beginning again. 
The database file for the first and second retrieval is the same.( args[1] : database path
Reducer is not used.
public class Retrieval {
 public static void main(String[] args) throws IOException, URISyntaxException {
  if (args.length != 3) {
     .println("Usage: Retrieval <protein set path> <database path> <output
  JobConf conf = new JobConf(new Configuration(), Retrieval.class);
  DistributedCache.addCacheFile(new URI(args[0]), conf);    
  FileInputFormat.addInputPath(conf, new Path(args[1]));
  FileOutputFormat.setOutputPath(conf, new Path(args[2]));

public class RetrievalMapper extends MapReduceBase implements
  Mapper<LongWritable, Text, Text, Text> {
 private Path[] localFiles;
 public void configure(JobConf conf) {
  try {
   this.localFiles = DistributedCache.getLocalCacheFiles(conf);
  } catch (IOException e) {
 public void map(LongWritable key, Text value,
   OutputCollector<Text, Text> output, Reporter reporter)
   throws IOException {
  String line = value.toString();
  LinkedList<String> list = new LinkedList<String>(); //store the first neighbors
  BufferedReader proReader = new BufferedReader(new FileReader(this.localFiles[0].toString()));
  String proID = new String("");
  String[] proteinIDs = line.split("\t");
  String tmpString = proteinIDs[0] + "\t" + proteinIDs[1];
  while ((proID = proReader.readLine()) != null) { // for each line (protein ID) in key file
   if(proID.equalsIgnoreCase(proteinIDs[0])){ // hit and proteinIDs[1] is its first neighbor
    output.collect(new Text(tmpString), new Text(proteinIDs[2]));
    list.add(proteinIDs[1]);  // add first neighbor to list
   if(proID.equalsIgnoreCase(proteinIDs[1])){ // hit and proteinIDs[0] is its first neighbor
    output.collect(new Text(tmpString), new Text(proteinIDs[2]));
    list.add(proteinIDs[0]);  // add first neighbor to list



Jun Tan
View raw message