Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 24699 invoked from network); 3 Jun 2008 14:39:42 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 3 Jun 2008 14:39:42 -0000 Received: (qmail 92140 invoked by uid 500); 3 Jun 2008 14:39:37 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 92099 invoked by uid 500); 3 Jun 2008 14:39:37 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 92064 invoked by uid 99); 3 Jun 2008 14:39:37 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Jun 2008 07:39:37 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of the.stuart.sierra@gmail.com designates 64.233.170.184 as permitted sender) Received: from [64.233.170.184] (HELO rn-out-0910.google.com) (64.233.170.184) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Jun 2008 14:38:47 +0000 Received: by rn-out-0910.google.com with SMTP id j71so297132rne.12 for ; Tue, 03 Jun 2008 07:39:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:in-reply-to:mime-version:content-type:references:x-google-sender-auth; bh=oFZxD9Mh0w5fNMS5tFjR0aRrrbofEboAl9iUIhgmZKY=; b=v82xVFxoyfuX4ClfVfOqbvUgyvxklQuFI91y9MullwGSS/XUnWLoZH3VGMwRsL5M25a13RRr4XMgSb9EJTAiyO4ZXSnxQ0ZF7CGak5yLn6QKOr7+IIfJzoZRWOkAKmQwMos+NAG2bdJCgrgUQX+oOUdEQ+YQX6X8LH1kDdVRO6w= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:sender:to:subject:in-reply-to:mime-version:content-type:references:x-google-sender-auth; b=TMb20xkzlQPn8U8fCwyXKqRX44L2wQ3huoAbVxqjmi61FwSnvI/KjG9eEhEl7xcsUHwWewFfcywBZl5/HaCB7ac5I8bDczTsdB3xmR4unKThbBO33C+CjhrIhHk/ggvUijSmzNtdk1qHUsKVp1f6YYK1f0JGsG1RW8x9hPO2st8= Received: by 10.115.50.5 with SMTP id c5mr2871315wak.192.1212503942782; Tue, 03 Jun 2008 07:39:02 -0700 (PDT) Received: by 10.115.18.8 with HTTP; Tue, 3 Jun 2008 07:39:02 -0700 (PDT) Message-ID: <314ee0450806030739h403e198fo1813cf6ff1dffbe1@mail.gmail.com> Date: Tue, 3 Jun 2008 10:39:02 -0400 From: "Stuart Sierra" Sender: the.stuart.sierra@gmail.com To: core-user@hadoop.apache.org Subject: Re: how to deserialize the contents of hadoop output (sequencefileoutputformat) In-Reply-To: <603280.19676.qm@web56501.mail.re3.yahoo.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_1693_28601951.1212503942777" References: <603280.19676.qm@web56501.mail.re3.yahoo.com> X-Google-Sender-Auth: cc4dff5fba4fef46 X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_1693_28601951.1212503942777 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Content-Disposition: inline On Tue, Jun 3, 2008 at 6:17 AM, Lin Guo wrote: > I am wondering whether it is possible to deserialize the keys and values in a hadoop output file where the output format is SequenceFileOutputFormat. I wrote some code to do this, samples attached. -Stuart ------=_Part_1693_28601951.1212503942777 Content-Type: text/x-java; name=SeqFilePrinter.java Content-Transfer-Encoding: base64 X-Attachment-Id: f_fh0li8pi0 Content-Disposition: attachment; filename=SeqFilePrinter.java LyogU2VxS2V5TGlzdC5qYXZhIC0gcHJpbnQgbGlzdCBvZiBrZXlzIGluIGEgU2VxdWVuY2VGaWxl CiAqCiAqIENvcHlyaWdodCAoQykgMjAwOCBTdHVhcnQgU2llcnJhCiAqCiAqIExpY2Vuc2VkIHVu ZGVyIHRoZSBBcGFjaGUgTGljZW5zZSwgVmVyc2lvbiAyLjAgKHRoZSAiTGljZW5zZSIpOyB5b3UK ICogbWF5IG5vdCB1c2UgdGhpcyBmaWxlIGV4Y2VwdCBpbiBjb21wbGlhbmNlIHdpdGggdGhlIExp Y2Vuc2UuIFlvdQogKiBtYXkgb2J0YWluIGEgY29weSBvZiB0aGUgTGljZW5zZSBhdAogKiBodHRw Ond3dy5hcGFjaGUub3JnL2xpY2Vuc2VzL0xJQ0VOU0UtMi4wCiAqIFVubGVzcyByZXF1aXJlZCBi eSBhcHBsaWNhYmxlIGxhdyBvciBhZ3JlZWQgdG8gaW4gd3JpdGluZywgc29mdHdhcmUKICogZGlz dHJpYnV0ZWQgdW5kZXIgdGhlIExpY2Vuc2UgaXMgZGlzdHJpYnV0ZWQgb24gYW4gIkFTIElTIiBC QVNJUywKICogV0lUSE9VVCBXQVJSQU5USUVTIE9SIENPTkRJVElPTlMgT0YgQU5ZIEtJTkQsIGVp dGhlciBleHByZXNzIG9yCiAqIGltcGxpZWQuIFNlZSB0aGUgTGljZW5zZSBmb3IgdGhlIHNwZWNp ZmljIGxhbmd1YWdlIGdvdmVybmluZwogKiBwZXJtaXNzaW9ucyBhbmQgbGltaXRhdGlvbnMgdW5k ZXIgdGhlIExpY2Vuc2UuCiAqLwoKaW1wb3J0IGphdmEubmlvLmNoYXJzZXQuQ2hhcnNldDsKCi8q IEZyb20gaGFkb29wLSotY29yZS5qYXIsIGh0dHA6Ly9oYWRvb3AuYXBhY2hlLm9yZy8KICogRGV2 ZWxvcGVkIHdpdGggSGFkb29wIDAuMTYuMy4gKi8KaW1wb3J0IG9yZy5hcGFjaGUuaGFkb29wLmNv bmYuQ29uZmlndXJhdGlvbjsKaW1wb3J0IG9yZy5hcGFjaGUuaGFkb29wLmZzLkZpbGVTeXN0ZW07 CmltcG9ydCBvcmcuYXBhY2hlLmhhZG9vcC5mcy5Mb2NhbEZpbGVTeXN0ZW07CmltcG9ydCBvcmcu YXBhY2hlLmhhZG9vcC5mcy5QYXRoOwppbXBvcnQgb3JnLmFwYWNoZS5oYWRvb3AuaW8uU2VxdWVu Y2VGaWxlOwppbXBvcnQgb3JnLmFwYWNoZS5oYWRvb3AuaW8uV3JpdGFibGU7CmltcG9ydCBvcmcu YXBhY2hlLmhhZG9vcC5pby5CeXRlc1dyaXRhYmxlOwoKCi8qKiBQcmludHMgdGhlIGNvbnRlbnRz IG9mIGEgU2VxdWVuY2VGaWxlLgogKgogKiBAYXV0aG9yIFN0dWFydCBTaWVycmEsIGh0dHA6Ly9z dHVhcnRzaWVycmEuY29tLwogKi8KcHVibGljIGNsYXNzIFNlcUZpbGVQcmludGVyIHsKCiAgICBw cml2YXRlIFN0cmluZyBpbnB1dEZpbGU7CiAgICBwcml2YXRlIExvY2FsU2V0dXAgc2V0dXA7Cgog ICAgcHVibGljIFNlcUZpbGVQcmludGVyKCkgdGhyb3dzIEV4Y2VwdGlvbiB7CiAgICAgICAgc2V0 dXAgPSBuZXcgTG9jYWxTZXR1cCgpOwogICAgfQoKICAgIC8qKiBTZXQgdGhlIG5hbWUgb2YgdGhl IGlucHV0IHNlcXVlbmNlIGZpbGUuCiAgICAgKgogICAgICogQHBhcmFtIGZpbGVuYW1lICAgYSBs b2NhbCBwYXRoIHN0cmluZwogICAgICovCiAgICBwdWJsaWMgdm9pZCBzZXRJbnB1dChTdHJpbmcg ZmlsZW5hbWUpIHsKICAgICAgICBpbnB1dEZpbGUgPSBmaWxlbmFtZTsKICAgIH0KCiAgICAvKiog UnVucyB0aGUgcHJvY2Vzcy4gS2V5cyBhcmUgcHJpbnRlZCB0byBzdGFuZGFyZCBvdXRwdXQ7CiAg ICAgKiBpbmZvcm1hdGlvbiBhYm91dCB0aGUgc2VxdWVuY2UgZmlsZSBpcyBwcmludGVkIHRvIHN0 YW5kYXJkCiAgICAgKiBlcnJvci4gKi8KICAgIHB1YmxpYyB2b2lkIGV4ZWN1dGUoKSB0aHJvd3Mg RXhjZXB0aW9uIHsKICAgICAgICBQYXRoIHBhdGggPSBuZXcgUGF0aChpbnB1dEZpbGUpOwogICAg ICAgIFNlcXVlbmNlRmlsZS5SZWFkZXIgcmVhZGVyID0gCiAgICAgICAgICAgIG5ldyBTZXF1ZW5j ZUZpbGUuUmVhZGVyKHNldHVwLmdldExvY2FsRmlsZVN5c3RlbSgpLCBwYXRoLCBzZXR1cC5nZXRD b25mKCkpOwoKICAgICAgICB0cnkgewogICAgICAgICAgICBTeXN0ZW0ub3V0LnByaW50bG4oIktl eSB0eXBlIGlzICIgKyByZWFkZXIuZ2V0S2V5Q2xhc3NOYW1lKCkpOwogICAgICAgICAgICBTeXN0 ZW0ub3V0LnByaW50bG4oIlZhbHVlIHR5cGUgaXMgIiArIHJlYWRlci5nZXRWYWx1ZUNsYXNzTmFt ZSgpKTsKICAgICAgICAgICAgaWYgKHJlYWRlci5pc0NvbXByZXNzZWQoKSkgewogICAgICAgICAg ICAgICAgU3lzdGVtLmVyci5wcmludGxuKCJWYWx1ZXMgYXJlIGNvbXByZXNzZWQuIik7CiAgICAg ICAgICAgICAgICBpZiAocmVhZGVyLmlzQmxvY2tDb21wcmVzc2VkKCkpIHsKICAgICAgICAgICAg ICAgICAgICBTeXN0ZW0uZXJyLnByaW50bG4oIlJlY29yZHMgYXJlIGJsb2NrLWNvbXByZXNzZWQu Iik7CiAgICAgICAgICAgICAgICB9CiAgICAgICAgICAgICAgICBTeXN0ZW0uZXJyLnByaW50bG4o IkNvbXByZXNzaW9uIHR5cGUgaXMgIiArIHJlYWRlci5nZXRDb21wcmVzc2lvbkNvZGVjKCkuZ2V0 Q2xhc3MoKS5nZXROYW1lKCkpOwogICAgICAgICAgICB9CiAgICAgICAgICAgIFN5c3RlbS5vdXQu cHJpbnRsbigiIik7CgogICAgICAgICAgICBXcml0YWJsZSBrZXkgPSAoV3JpdGFibGUpKHJlYWRl ci5nZXRLZXlDbGFzcygpLm5ld0luc3RhbmNlKCkpOwogICAgICAgICAgICBXcml0YWJsZSB2YWwg PSAoV3JpdGFibGUpKHJlYWRlci5nZXRWYWx1ZUNsYXNzKCkubmV3SW5zdGFuY2UoKSk7CiAgICAg ICAgICAgIFN0cmluZyB2YWx1ZTsKICAgICAgICAgICAgd2hpbGUgKHJlYWRlci5uZXh0KGtleSwg dmFsKSkgewogICAgICAgICAgICAgICAgU3lzdGVtLm91dC5wcmludGxuKCI9PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT0iKTsKICAgICAg ICAgICAgICAgIFN5c3RlbS5vdXQucHJpbnRsbigiS0VZOlx0IiArIGtleS50b1N0cmluZygpKTsK CiAgICAgICAgICAgICAgICBpZiAodmFsIGluc3RhbmNlb2YgQnl0ZXNXcml0YWJsZSkgewogICAg ICAgICAgICAgICAgICAgIEJ5dGVzV3JpdGFibGUgdiA9IChCeXRlc1dyaXRhYmxlKXZhbDsKICAg ICAgICAgICAgICAgICAgICB2YWx1ZSA9IG5ldyBTdHJpbmcodi5nZXQoKSwgMCwgdi5nZXRTaXpl KCkpOwogICAgICAgICAgICAgICAgfSBlbHNlIHsKICAgICAgICAgICAgICAgICAgICB2YWx1ZSA9 IHZhbC50b1N0cmluZygpOwogICAgICAgICAgICAgICAgfQoKICAgICAgICAgICAgICAgIFN5c3Rl bS5vdXQucHJpbnRsbigiVkFMVUU6XG4iICsgdmFsdWUpOwogICAgICAgICAgICB9CiAgICAgICAg fSBmaW5hbGx5IHsKICAgICAgICAgICAgcmVhZGVyLmNsb3NlKCk7CiAgICAgICAgfQogICAgfQoK ICAgIHB1YmxpYyBzdGF0aWMgdm9pZCBtYWluKFN0cmluZ1tdIGFyZ3MpIHsKICAgICAgICBpZiAo YXJncy5sZW5ndGggIT0gMSkgewogICAgICAgICAgICBleGl0V2l0aEhlbHAoKTsKICAgICAgICB9 CgogICAgICAgIHRyeSB7CiAgICAgICAgICAgIFNlcUZpbGVQcmludGVyIG1lID0gbmV3IFNlcUZp bGVQcmludGVyKCk7CiAgICAgICAgICAgIG1lLnNldElucHV0KGFyZ3NbMF0pOwogICAgICAgICAg ICBtZS5leGVjdXRlKCk7CiAgICAgICAgfSBjYXRjaCAoRXhjZXB0aW9uIGUpIHsKICAgICAgICAg ICAgZS5wcmludFN0YWNrVHJhY2UoKTsKICAgICAgICAgICAgZXhpdFdpdGhIZWxwKCk7CiAgICAg ICAgfQogICAgfQoKICAgIC8qKiBQcmludHMgdXNhZ2UgaW5zdHJ1Y3Rpb25zIHRvIHN0YW5kYXJk IGVycm9yIGFuZCBleGl0cy4gKi8KICAgIHB1YmxpYyBzdGF0aWMgdm9pZCBleGl0V2l0aEhlbHAo KSB7CiAgICAgICAgU3lzdGVtLmVyci5wcmludGxuKCJVc2FnZTogamF2YSBTZXFGaWxlUHJpbnRl ciA8c2VxdWVuY2UtZmlsZT5cbiIgKwogICAgICAgICAgICAgICAgICAgICAgICAgICAiUHJpbnRz IHRoZSBjb250ZW50cyBvZiB0aGUgc2VxdWVuY2UgZmlsZS4iKTsKICAgICAgICBTeXN0ZW0uZXhp dCgxKTsKICAgIH0KfQo= ------=_Part_1693_28601951.1212503942777 Content-Type: text/x-java; name=LocalSetup.java Content-Transfer-Encoding: base64 X-Attachment-Id: f_fh0lig5o1 Content-Disposition: attachment; filename=LocalSetup.java LyogTG9jYWxTZXR1cC5qYXZhIC0tIHN1cHBvcnQgZm9yIHRoZSBIYWRvb3AgQVBJIG91dHNpZGUg b2YgSGFkb29wCiAqCiAqIENvcHlyaWdodCAoQykgMjAwOCBTdHVhcnQgU2llcnJhCiAqCiAqIExp Y2Vuc2VkIHVuZGVyIHRoZSBBcGFjaGUgTGljZW5zZSwgVmVyc2lvbiAyLjAgKHRoZSAiTGljZW5z ZSIpOyB5b3UKICogbWF5IG5vdCB1c2UgdGhpcyBmaWxlIGV4Y2VwdCBpbiBjb21wbGlhbmNlIHdp dGggdGhlIExpY2Vuc2UuIFlvdQogKiBtYXkgb2J0YWluIGEgY29weSBvZiB0aGUgTGljZW5zZSBh dAogKiBodHRwOnd3dy5hcGFjaGUub3JnL2xpY2Vuc2VzL0xJQ0VOU0UtMi4wCiAqIFVubGVzcyBy ZXF1aXJlZCBieSBhcHBsaWNhYmxlIGxhdyBvciBhZ3JlZWQgdG8gaW4gd3JpdGluZywgc29mdHdh cmUKICogZGlzdHJpYnV0ZWQgdW5kZXIgdGhlIExpY2Vuc2UgaXMgZGlzdHJpYnV0ZWQgb24gYW4g IkFTIElTIiBCQVNJUywKICogV0lUSE9VVCBXQVJSQU5USUVTIE9SIENPTkRJVElPTlMgT0YgQU5Z IEtJTkQsIGVpdGhlciBleHByZXNzIG9yCiAqIGltcGxpZWQuIFNlZSB0aGUgTGljZW5zZSBmb3Ig dGhlIHNwZWNpZmljIGxhbmd1YWdlIGdvdmVybmluZwogKiBwZXJtaXNzaW9ucyBhbmQgbGltaXRh dGlvbnMgdW5kZXIgdGhlIExpY2Vuc2UuCiAqLwoKLyogRnJvbSBoYWRvb3AtKi1jb3JlLmphciwg aHR0cDovL2hhZG9vcC5hcGFjaGUub3JnLwogKiBEZXZlbG9wZWQgd2l0aCBIYWRvb3AgMC4xNi4z LiAqLwppbXBvcnQgb3JnLmFwYWNoZS5oYWRvb3AuY29uZi5Db25maWd1cmF0aW9uOwppbXBvcnQg b3JnLmFwYWNoZS5oYWRvb3AuZnMuRmlsZVN5c3RlbTsKCi8qKiBQcm92aWRlcyBIYWRvb3AgY29u ZmlndXJhdGlvbiBhbmQgbG9jYWwgZmlsZSBzeXN0ZW0gb2JqZWN0cyBmb3IKICogb3RoZXIgY2xh c3Nlcy4gIFRoaXMgaXMgZm9yIHNpdHVhdGlvbnMgd2hlcmUgeW91IHdhbnQgdG8gdXNlIHNvbWUK ICogcGFydCBvZiB0aGUgSGFkb29wIGNvZGUgb3V0c2lkZSBvZiB0aGUgSGFkb29wIE1hcC9SZWR1 Y2UKICogZnJhbWV3b3JrLgogKgogKiBAYXV0aG9yIFN0dWFydCBTaWVycmEsIGh0dHA6Ly9zdHVh cnRzaWVycmEuY29tLwogKi8KcHVibGljIGNsYXNzIExvY2FsU2V0dXAgewoKICAgIHByaXZhdGUg RmlsZVN5c3RlbSBmaWxlU3lzdGVtOwogICAgcHJpdmF0ZSBDb25maWd1cmF0aW9uIGNvbmZpZzsK CiAgICAvKiogU2V0cyB1cCBDb25maWd1cmF0aW9uIGFuZCBMb2NhbEZpbGVTeXN0ZW0gaW5zdGFu Y2VzIGZvcgogICAgICogSGFkb29wLiAgVGhyb3dzIEV4Y2VwdGlvbiBpZiB0aGV5IGZhaWwuICBE b2VzIG5vdCBsb2FkIGFueQogICAgICogSGFkb29wIFhNTCBjb25maWd1cmF0aW9uIGZpbGVzLCBq dXN0IHNldHMgdGhlIG1pbmltdW0KICAgICAqIGNvbmZpZ3VyYXRpb24gbmVjZXNzYXJ5IHRvIHVz ZSB0aGUgbG9jYWwgZmlsZSBzeXN0ZW0uCiAgICAgKi8KICAgIHB1YmxpYyBMb2NhbFNldHVwKCkg dGhyb3dzIEV4Y2VwdGlvbiB7CiAgICAgICAgY29uZmlnID0gbmV3IENvbmZpZ3VyYXRpb24oKTsK CiAgICAgICAgLyogTm9ybWFsbHkgc2V0IGluIGhhZG9vcC1kZWZhdWx0LnhtbCwgd2l0aG91dCBp dCB5b3UgZ2V0CiAgICAgICAgICogImphdmEuaW8uSU9FeGNlcHRpb246IE5vIEZpbGVTeXN0ZW0g Zm9yIHNjaGVtZTogZmlsZSIgKi8KICAgICAgICBjb25maWcuc2V0KCJmcy5maWxlLmltcGwiLCAi b3JnLmFwYWNoZS5oYWRvb3AuZnMuTG9jYWxGaWxlU3lzdGVtIik7CgogICAgICAgIGZpbGVTeXN0 ZW0gPSBGaWxlU3lzdGVtLmdldChjb25maWcpOwogICAgICAgIGlmIChmaWxlU3lzdGVtLmdldENv bmYoKSA9PSBudWxsKSB7CiAgICAgICAgICAgIC8qIFRoaXMgaGFwcGVucyBpZiB0aGUgRmlsZVN5 c3RlbSBpcyBub3QgcHJvcGVybHkKICAgICAgICAgICAgICogaW5pdGlhbGl6ZWQsIGNhdXNlcyBO dWxsUG9pbnRlckV4Y2VwdGlvbiBsYXRlci4gKi8KICAgICAgICAgICAgdGhyb3cgbmV3IEV4Y2Vw dGlvbigiTG9jYWxGaWxlU3lzdGVtIGNvbmZpZ3VyYXRpb24gaXMgbnVsbCIpOwogICAgICAgIH0K ICAgIH0KCiAgICAvKiogUmV0dXJucyBhIEhhZG9vcCBDb25maWd1cmF0aW9uIGluc3RhbmNlIGZv ciB1c2UgaW4gSGFkb29wIEFQSQogICAgICogY2FsbHMuICovCiAgICBwdWJsaWMgQ29uZmlndXJh dGlvbiBnZXRDb25mKCkgewogICAgICAgIHJldHVybiBjb25maWc7CiAgICB9CgogICAgLyoqIFJl dHVybnMgYSBIYWRvb3AgRmlsZVN5c3RlbSBpbnN0YW5jZSB0aGF0IHByb3ZpZGVzIGFjY2VzcyB0 bwogICAgICogdGhlIGxvY2FsIGZpbGVzeXN0ZW0uICovCiAgICBwdWJsaWMgRmlsZVN5c3RlbSBn ZXRMb2NhbEZpbGVTeXN0ZW0oKSB7CiAgICAgICAgcmV0dXJuIGZpbGVTeXN0ZW07CiAgICB9Cn0= ------=_Part_1693_28601951.1212503942777--