incubator-jena-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paolo Castagna <>
Subject Blank nodes and MapReduce
Date Mon, 27 Jun 2011 18:15:04 GMT
I have a MapReduce job with a map function which parses a line from an
N-Quads file:

  private static final Logger log = LoggerFactory.getLogger(FirstMapper.class);
  private String inputFileName;
  private MapReduceParserProfile profile;
  private LabelToNode labelMapping;

  public void setup(Context context) throws IOException, InterruptedException {
      inputFileName = context.getConfiguration().get("");
      Prologue prologue = new Prologue(null, IRIResolver.createNoResolve());
      labelMapping = new MapReduceLabelToNode(inputFileName);
      profile = new MapReduceParserProfile(prologue,
        ErrorHandlerFactory.errorHandlerStd, labelMapping);

  public void map (LongWritable key, Text value, Context context)
  throws IOException, InterruptedException {
      if ( log.isDebugEnabled() ) log.debug("< ({}, {})", key, value);
      SinkToContext sink = new SinkToContext(context);
      Tokenizer tokenizer = TokenizerFactory.makeTokenizerString(value.toString());
      LangNQuads parser = new LangNQuads(tokenizer, profile, sink) ;

(A RecordReader<LongWritable, QuadWritable> would be better, but for now the
snippet above does its job. Almost.)

The problem I have is with blank node labels.

With MapReduce the same file will be split into multiple file splits which
are parsed on different machines. Therefore, I would like to have my own
LabelToNode implementation with an Allocator<String, Node> which takes into
account the filename (or an hash of it) when it creates a new blank node.

Something along these lines:

  public Node create(String label) {
      return Node.createAnon(new AnonId(filename + "-" + label)) ;

So, I have my MapReduceLabelToNode:

public class MapReduceLabelToNode extends LabelToNode {

    public MapReduceLabelToNode(String filename) {
        super(new SingleScopePolicy(), new MapReduceAllocator(filename));


But LabelToNode constructor is private.

Could we make it protected?

Or, alternatively, how can I construct a LabelToNode object which will be using
my MapReduceAllocator?


View raw message