Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 99687 invoked from network); 28 May 2008 03:03:29 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 28 May 2008 03:03:29 -0000 Received: (qmail 99316 invoked by uid 500); 28 May 2008 03:03:28 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 99289 invoked by uid 500); 28 May 2008 03:03:28 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 99278 invoked by uid 99); 28 May 2008 03:03:28 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 27 May 2008 20:03:28 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of standingbear@gmail.com designates 72.14.220.156 as permitted sender) Received: from [72.14.220.156] (HELO fg-out-1718.google.com) (72.14.220.156) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 May 2008 03:02:41 +0000 Received: by fg-out-1718.google.com with SMTP id l26so1928862fgb.35 for ; Tue, 27 May 2008 20:02:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; bh=Htmq/pE2g+CmbcSMfacab0gTi5i2NO1Fwlsb/TIqhrE=; b=bxzUEMadn7HDcJhnAS4aAA9HbTXfE4FMZ/W9H5Xcik4onpQNz9U/idTweS6katPpknn97eKFl2yKFl14aPMBiO1iFXrKMfA9mBGn7KCumE3PSO1ZX/a0mA7GGnznQYSgLFESZbAf/J/Vc0Au0qDBhz87dvzWtFSwIv0oYCIQOxM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=qupeoFD5lU49m71fHzvUM1o6U+lN9ZWDprPlb/7wkjjDj0PY/k9CZtQkjZJ2N34g7+lSBWzUdAgdbkM2VXDfa7tnHrCqFLPX5lWrBqKEKYldSooWy8/a0kdUR4QCf9U/EanTVBenVx8+xKeDWMQ2Y2UNPtXPV/pfV8jXCSZpTeI= Received: by 10.82.135.7 with SMTP id i7mr320420bud.42.1211943776403; Tue, 27 May 2008 20:02:56 -0700 (PDT) Received: by 10.82.107.20 with HTTP; Tue, 27 May 2008 20:02:56 -0700 (PDT) Message-ID: Date: Tue, 27 May 2008 23:02:56 -0400 From: "Jim the Standing Bear" To: core-user@hadoop.apache.org Subject: Re: How to make a lucene Document hadoop Writable? In-Reply-To: <483CC87C.70901@apache.org> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <483CC87C.70901@apache.org> X-Virus-Checked: Checked by ClamAV on apache.org Thanks for the quick response, Dennis. However, your code snippet was about how to serialize/deserialize using ObjectInputStream/ObjectOutputStream. Maybe it was my fault for not making the question clear enough - I was wondering if and how I can serialize/deserialize using only DataInput and DataOutput. This is because the Writable Interface defined by Hadoop has the following two methods: void readFields(DataInput in) Deserialize the fields of this object from in. void write(DataOutput out) Serialize the fields of this object to out so I must start with DataInput and DataOutput, and work my way to ObjectInputStream and ObjectOutputStream. Yet I have not found a way to go from DataInput to ObjectInputStream. Any ideas? -- Jim On Tue, May 27, 2008 at 10:50 PM, Dennis Kubes wrote: > You can use something like the code below to go back and forth from > serializables. The problem with lucene documents is that fields which are > not stored will be lost during the serialization / deserialization process. > > Dennis > > public static Object toObject(byte[] bytes, int start) > throws IOException, ClassNotFoundException { > > if (bytes == null || bytes.length == 0 || start >= bytes.length) { > return null; > } > > ByteArrayInputStream bais = new ByteArrayInputStream(bytes); > bais.skip(start); > ObjectInputStream ois = new ObjectInputStream(bais); > > Object bObject = ois.readObject(); > > bais.close(); > ois.close(); > > return bObject; > } > > public static byte[] fromObject(Serializable toBytes) > throws IOException { > > ByteArrayOutputStream baos = new ByteArrayOutputStream(); > ObjectOutputStream oos = new ObjectOutputStream(baos); > > oos.writeObject(toBytes); > oos.flush(); > > byte[] objBytes = baos.toByteArray(); > > baos.close(); > oos.close(); > > return objBytes; > } > > > Jim the Standing Bear wrote: >> >> Hello, >> >> I am not sure if this is a genuine hadoop question or more towards a >> core-java question. I am hoping to create a wrapper over Lucene >> Document, so that this wrapper can be used for the value field of a >> Hadoop SequenceFile, and therefore, this wrapper must also implement >> the Writable interface. >> >> Lucene's Document is already made serializable, which is quite nice. >> However, the Writable interface definition gives only DataInput and >> DataOutput, and I am having a hard time trying to figure out how to >> serialize/deserialize an lucene Document object using >> DataInput/DataOutput. In other words, how do I go from DataInput to >> ObjectInputStream, or from DataOutput to ObjectOutputStream? Thanks. >> >> -- Jim > -- -------------------------------------- Standing Bear Has Spoken --------------------------------------