Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4DB01DF57 for ; Wed, 12 Dec 2012 06:05:00 +0000 (UTC) Received: (qmail 56980 invoked by uid 500); 12 Dec 2012 06:04:55 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 55907 invoked by uid 500); 12 Dec 2012 06:04:51 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 55854 invoked by uid 99); 12 Dec 2012 06:04:49 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Dec 2012 06:04:49 +0000 X-ASF-Spam-Status: No, hits=0.2 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,RCVD_IN_DNSWL_NONE,SPF_PASS,UNPARSEABLE_RELAY X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [98.139.213.139] (HELO nm27-vm0.bullet.mail.bf1.yahoo.com) (98.139.213.139) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Dec 2012 06:04:39 +0000 Received: from [98.139.215.140] by nm27.bullet.mail.bf1.yahoo.com with NNFMP; 12 Dec 2012 06:04:17 -0000 Received: from [98.139.211.205] by tm11.bullet.mail.bf1.yahoo.com with NNFMP; 12 Dec 2012 06:04:17 -0000 Received: from [127.0.0.1] by smtp214.mail.bf1.yahoo.com with NNFMP; 12 Dec 2012 06:04:17 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1355292257; bh=HtbLV+VErQcOYQpADHfhwsHjQvqaFhCwe49K/CgV0wo=; h=X-Yahoo-Newman-Id:X-Yahoo-Newman-Property:X-YMail-OSG:X-Yahoo-SMTP:Received:From:To:References:In-Reply-To:Subject:Date:Message-ID:MIME-Version:Content-Type:Content-Transfer-Encoding:X-Mailer:Thread-Index:Content-Language; b=Es31Is+M80jo8zL8tNqAEBwbo3qWCozKO7vZjeZjDUL9zghaWjmGv0OxHjAx/Vn3F3gim5F6jjZAgSg9yFscadmnjmLLn6yv90EeBAISqLpcCp+MUcPY9KYTM3aHgJpAlBS8nMBtWkG6ZTUqg8IbrTKdkUKnxEaf2sUQudGx3iE= X-Yahoo-Newman-Id: 786569.47300.bm@smtp214.mail.bf1.yahoo.com X-Yahoo-Newman-Property: ymail-3 X-YMail-OSG: JJG4CcIVM1mfw8FF4fJufUT6dYpp7nD2Dgvx1XY05Uua8.6 GRlHCQoss0WSV27h8Iv.PwaBhyr4ByYZS1.JuGgJs_GMw5SXWr71nUwDvj3w Kg9pw11WeUmEgwIxZkS_fr6YBuDO17b53wKUUtVLXqsY3OAfyr74PblEvAFj oruIUHM2UFFn9MsAxjxjVVIVgUFNXSwzu8CvdbxAHf45VaRMStHNwuP_XQ2W dilIYfONK5srBpimT2aaUIh2U5iqnvc7oBv0gtLYuPuqJL1ougHFxhiShz7w osZefMElXiGFmnCrjdBMQ.zt3dd4zTu.bbZOxUwd._mPcQtas3ZT8NEtM4gB C53XNkSSsjHnn1mw9qaeeQm76YY14BmTcK3qrQhUJyLgACOdcl0lmpp4Wsuq EQKPb4f8N62YaFCrJyvm5jp6XiJobMo0Ohq_XltcvH7.sCyAgGvOsJTEGAWq DQxScHdhohkjCd6M- X-Yahoo-SMTP: k2gD1GeswBAV_JFpZm8dmpTCwr4ufTKOyA-- Received: from sattelite (davidparks21@113.161.75.108 with login) by smtp214.mail.bf1.yahoo.com with SMTP; 11 Dec 2012 22:04:17 -0800 PST From: "David Parks" To: References: In-Reply-To: Subject: RE: Hadoop 101 Date: Wed, 12 Dec 2012 13:04:09 +0700 Message-ID: <04c401cdd82e$8265d760$87318620$@yahoo.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Outlook 14.0 Thread-Index: AQGLE7O6MR8qbY4aWSM1an2y/pBTTpiaNH6g Content-Language: en-us X-Virus-Checked: Checked by ClamAV on apache.org You use TextInputFormat, you'll get the following key, value pairs in your mapper: file_position, your_input Example: 0, "0\t[356:0.3481597,359:0.3481597,358:0.3481597,361:0.3481597,360:0.3481597]" 100, "8\t[356:0.34786037,359:0.34786037,358:0.34786037,361:0.34786037,360:0.34786 037]" 200, "25\t[284:0.34821576,286:0.34821576,287:0.34821576,288:0.34821576,289:0.3482 1576]" Then just parse it out in your mapper. -----Original Message----- From: Pat Ferrel [mailto:pat.ferrel@gmail.com] Sent: Wednesday, December 12, 2012 7:50 AM To: user@hadoop.apache.org Subject: Hadoop 101 Stupid question for the day. I have a file created by a mahout job of the form: 0 [356:0.3481597,359:0.3481597,358:0.3481597,361:0.3481597,360:0.3481597] 8 [356:0.34786037,359:0.34786037,358:0.34786037,361:0.34786037,360:0.34786037] 25 [284:0.34821576,286:0.34821576,287:0.34821576,288:0.34821576,289:0.34821576] 28 [452:0.34802154,454:0.34802154,453:0.34802154,456:0.34802154,455:0.34802154] . If this were a SequenceFile I could read it and be merrily on my way but it's a text file. The classes written are key, value pairs but the file is tab delimited text. I was hoping to do something like: SequenceFile.Reader reader = new SequenceFile.Reader(fs, inputFile, conf); Writable userId = new LongWritable(); VectorWritable recommendations = new VectorWritable(); while (reader.next(userId, recommendations)) { //do something with each pair } But alas Google fails me. How do you read in key, values pairs from text files outside of a map or reduce?