Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: common-user@hadoop.apache.org
Received-SPF: neutral (nike.apache.org: local policy)
MIME-Version: 1.0
From: Leon Mergen <leon@solatis.com>
Date: Mon, 19 Jul 2010 14:56:51 +0200
Message-ID: <AANLkTimZXpnuA-b_yLO_M9WO2cz9uDgNLZpO_S_RscyQ@mail.gmail.com>
Subject: libhdfs / gzip support
To: common-user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=001485f912b805892e048bbd1cb5

--001485f912b805892e048bbd1cb5
Content-Type: text/plain; charset=ISO-8859-1

Hello,

We're using Hadoop in a C-oriented architecture ourselves, using libhdfs for
storing files and Hadoop.Pipes for map/reduce jobs. Since the data we're
storing benefits a lot from compression, we're currently investigating ways
to do this.

Ideally we would perform block-level compression: each separate 64MB block
of data would be compressed. Hadoop.Pipes seems to provide a way to change
the InputReader and OutputReader to enable the GzipCodec, however, I did not
find a good way to tell libhdfs to store files compressed.

Anyone has any experience with this, and/or ideas how to best approach this
problem?

We're using Hadoop 0.20.2

Regards,

Leon Mergen

--001485f912b805892e048bbd1cb5--