hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter W." <pet...@marketingbrokers.com>
Subject methods and data into SequenceFile
Date Thu, 12 Apr 2007 18:50:09 GMT

This is my first post to the Hadoop list
and have not yet written a program using
the framework.

I'm querying several large Lucene indexes,
and generating about 30 text files 1-3MB
each. These files contain metadata about the
indexed documents and a corresponding MD5 key.

This unique key exists for each document within the
Lucene index and matches those in the metadata.

My current solution is to read in about 50-80MB
of text into memory run some routines and generate
double ranking weights for each document separate
and complementary to Lucene scoring (ratings).

Then reassemble docs including the new fields by id.

It works now, but the JVM approaches 1GB of resident
private memory so it isn't scalable. My goal is move
this into a Map Reduce but I don't yet know how. ;)

What steps are required to turn several java methods
and data sets into a SequenceFile?

Kind Regards,

Peter W.

View raw message