Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BD35211E78 for ; Sun, 12 May 2013 18:25:20 +0000 (UTC) Received: (qmail 5100 invoked by uid 500); 12 May 2013 18:25:15 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 4987 invoked by uid 500); 12 May 2013 18:25:15 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 4980 invoked by uid 99); 12 May 2013 18:25:15 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 12 May 2013 18:25:15 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jim.twensky@gmail.com designates 209.85.219.41 as permitted sender) Received: from [209.85.219.41] (HELO mail-oa0-f41.google.com) (209.85.219.41) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 12 May 2013 18:25:09 +0000 Received: by mail-oa0-f41.google.com with SMTP id n9so2415695oag.28 for ; Sun, 12 May 2013 11:24:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:date:message-id:subject:from:to :content-type; bh=yEz8TrNiZ6qgD/fzxFrLwkApVCTSk8OBmiuMc46UjIE=; b=VirzV0q/40U4t6xW8Qq5Ok9jCz4i8H/rZ9v/W6Nvh99xpuM3G3vmrzoDiy2U6E309h JWnUvVPN6OE81fFHJTxtvwlCPtiw6+pRt7W/VTcRIl7MQx7+4dFAnvoMjcGdtifTVlM0 JuQchrIv6/+NEgMmzrIWb1XdTmz4+Hm5jZcYc8aoyo/PwJc6+RNS0EAbJq0vxj0AxCoW t/elrJ45n0sKn1ETrKmVkGZ0gWug4tOfv5dzDJppZGUMBNSlie2NUNYaoV/+/YXif8VY L18Mh1r3yJNhmWOGiKb1bY1ac33D4xFpkrEODW59cKJtbSFLUa6zZx6iUbYI0yVb5tbv 6Erg== MIME-Version: 1.0 X-Received: by 10.60.37.98 with SMTP id x2mr11525664oej.44.1368383088692; Sun, 12 May 2013 11:24:48 -0700 (PDT) Received: by 10.60.118.226 with HTTP; Sun, 12 May 2013 11:24:48 -0700 (PDT) Date: Sun, 12 May 2013 13:24:48 -0500 Message-ID: Subject: Wrapping around BitSet with the Writable interface From: Jim Twensky To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=089e0112d0daeb12bc04dc8983a4 X-Virus-Checked: Checked by ClamAV on apache.org --089e0112d0daeb12bc04dc8983a4 Content-Type: text/plain; charset=ISO-8859-1 I have large java.util.BitSet objects that I want to bitwise-OR using a MapReduce job. I decided to wrap around each object using the Writable interface. Right now I convert each BitSet to a byte array and serialize the byte array on disk. Converting them to byte arrays is a bit inefficient but I could not find a work around to write them directly to the DataOutput. Is there a way to skip this and serialize the object directly? Here is what my current implementation looks like: public class BitSetWritable implements Writable { private BitSet bs; public BitSetWritable() { this.bs = new BitSet(); } @Override public void write(DataOutput out) throws IOException { ByteArrayOutputStream bos = new ByteArrayOutputStream(bs.size()/8); ObjectOutputStream oos = new ObjectOutputStream(bos); oos.writeObject(bs); byte[] bytes = bos.toByteArray(); oos.close(); out.writeInt(bytes.length); out.write(bytes); } @Override public void readFields(DataInput in) throws IOException { int len = in.readInt(); byte[] bytes = new byte[len]; in.readFully(bytes); ByteArrayInputStream bis = new ByteArrayInputStream(bytes); ObjectInputStream ois = new ObjectInputStream(bis); try { bs = (BitSet) ois.readObject(); } catch (ClassNotFoundException e) { throw new IOException(e); } ois.close(); } } --089e0112d0daeb12bc04dc8983a4 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
I have large java.util.BitSet objects that I want to bitwi= se-OR using a MapReduce job. I decided to wrap around each object using the= Writable interface. Right now I convert each BitSet to a byte array and se= rialize the byte array on disk.

Converting them to byte arrays is a bit inefficient but I could not fin= d a work around to write them directly to the DataOutput. Is there a way to= skip this and serialize the object directly? Here is what my current imple= mentation looks like:

public class BitSetWritable implements Writable {
=A0
=A0 privat= e BitSet bs;
=A0
=A0 public BitSetWritable() {
=A0=A0=A0 this.bs =3D new BitSet();
=A0 }

=A0 @Over= ride
=A0 public void write(DataOutput out) throws IOException {
=A0=A0=A0
=A0=A0=A0 ByteArrayOutputStream bos =3D new ByteArrayOutputSt= ream(bs.size()/8);
=A0=A0=A0 ObjectOutputStream oos =3D new ObjectOutput= Stream(bos);
=A0=A0=A0 oos.writeObject(bs);
=A0=A0=A0 byte[] bytes = =3D bos.toByteArray();
=A0=A0=A0 oos.close();=A0=A0=A0
=A0=A0=A0 out.writeInt(bytes.length);
=A0=A0=A0 out.write(bytes);=A0=A0= =A0
=A0=A0=A0
=A0 }

=A0 @Override
=A0 public void readFie= lds(DataInput in) throws IOException {
=A0=A0=A0
=A0=A0=A0 int len = =3D in.readInt();
=A0=A0=A0 byte[] bytes =3D new byte[len];
=A0=A0=A0 in.readFully(bytes);
=A0=A0=A0
=A0=A0=A0 ByteArrayInputStr= eam bis =3D new ByteArrayInputStream(bytes);
=A0=A0=A0 ObjectInputStream= ois =3D new ObjectInputStream(bis);
=A0=A0=A0 try {
=A0=A0=A0=A0=A0 = bs =3D (BitSet) ois.readObject();
=A0=A0=A0 } catch (ClassNotFoundExcept= ion e) {
=A0=A0=A0=A0=A0 throw new IOException(e);
=A0=A0=A0 }
=A0=A0=A0
= =A0=A0=A0 ois.close();
=A0 }

}
--089e0112d0daeb12bc04dc8983a4--