hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dennis Kubes <ku...@apache.org>
Subject Re: How to make a lucene Document hadoop Writable?
Date Wed, 28 May 2008 02:50:36 GMT
You can use something like the code below to go back and forth from 
serializables.  The problem with lucene documents is that fields which 
are not stored will be lost during the serialization / deserialization 
process.

Dennis

public static Object toObject(byte[] bytes, int start)
   throws IOException, ClassNotFoundException {

   if (bytes == null || bytes.length == 0 || start >= bytes.length) {
     return null;
   }

   ByteArrayInputStream bais = new ByteArrayInputStream(bytes);
   bais.skip(start);
   ObjectInputStream ois = new ObjectInputStream(bais);

   Object bObject = ois.readObject();

   bais.close();
   ois.close();

   return bObject;
}

public static byte[] fromObject(Serializable toBytes)
   throws IOException {

   ByteArrayOutputStream baos = new ByteArrayOutputStream();
   ObjectOutputStream oos = new ObjectOutputStream(baos);

   oos.writeObject(toBytes);
   oos.flush();

   byte[] objBytes = baos.toByteArray();

   baos.close();
   oos.close();

   return objBytes;
}


Jim the Standing Bear wrote:
> Hello,
> 
> I am not sure if this is a genuine hadoop question or more towards a
> core-java question.  I am hoping to create a wrapper over Lucene
> Document, so that this wrapper can be used for the value field of a
> Hadoop SequenceFile, and therefore, this wrapper must also implement
> the Writable interface.
> 
> Lucene's Document is already made serializable, which is quite nice.
> However, the Writable interface definition gives only DataInput and
> DataOutput, and I am having a hard time trying to figure out how to
> serialize/deserialize an lucene Document object using
> DataInput/DataOutput.  In other words, how do I go from DataInput to
> ObjectInputStream, or from DataOutput to ObjectOutputStream?  Thanks.
> 
> -- Jim

Mime
View raw message