Return-Path: Delivered-To: apmail-mahout-user-archive@www.apache.org Received: (qmail 76475 invoked from network); 25 Oct 2010 05:48:42 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 25 Oct 2010 05:48:42 -0000 Received: (qmail 65135 invoked by uid 500); 25 Oct 2010 05:48:42 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 64913 invoked by uid 500); 25 Oct 2010 05:48:39 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 64900 invoked by uid 99); 25 Oct 2010 05:48:37 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Oct 2010 05:48:37 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of divya@k2associates.com.sg designates 202.75.59.28 as permitted sender) Received: from [202.75.59.28] (HELO host-7.onnet.com.my) (202.75.59.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Oct 2010 05:48:29 +0000 Received: from 100.210-193-58.adsl.qala.com.sg ([210.193.58.100] helo=k2asystem) by host-7.onnet.com.my with esmtp (Exim 4.69) (envelope-from ) id 1PAFu8-0002lg-Dh for user@mahout.apache.org; Mon, 25 Oct 2010 13:48:05 +0800 From: "Divya" To: Subject: Vector in Mahout Date: Mon, 25 Oct 2010 13:47:38 +0800 Message-ID: <008301cb7408$22c8daa0$685a8fe0$@com.sg> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_0084_01CB744B.30EC1AA0" X-Mailer: Microsoft Office Outlook 12.0 Thread-Index: Act0CB13O3BV5kZMTFSZR4VGsJGGTw== Content-Language: en-us x-cr-hashedpuzzle: AHFl ARA+ A0gC B5EU CxDu FPBP FwC2 GAHv GKlU GXzo Gf9U H1JV IZyX JuoI KErU L2Ds;1;dQBzAGUAcgBAAG0AYQBoAG8AdQB0AC4AYQBwAGEAYwBoAGUALgBvAHIAZwA=;Sosha1_v1;7;{87001BE6-49F9-40F7-B52E-9213C7D78CCD};ZABpAHYAeQBhAEAAawAyAGEAcwBzAG8AYwBpAGEAdABlAHMALgBjAG8AbQAuAHMAZwA=;Mon, 25 Oct 2010 05:47:31 GMT;VgBlAGMAdABvAHIAIABpAG4AIABNAGEAaABvAHUAdAA= x-cr-puzzleid: {87001BE6-49F9-40F7-B52E-9213C7D78CCD} X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - host-7.onnet.com.my X-AntiAbuse: Original Domain - mahout.apache.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - k2associates.com.sg ------=_NextPart_000_0084_01CB744B.30EC1AA0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Hi, Can any one please help me how the vectors looks like in Mahout. I tried converting directory structure into Sequence file and then to vectors but the files which I have got is either .crc or file can see data as some junk characters. Is this how it looks like. Can any one tell me how does it looks like. how can I use it for document similarity. Regards, Divya ------=_NextPart_000_0084_01CB744B.30EC1AA0--