From user-return-5510-apmail-mahout-user-archive=mahout.apache.org@mahout.apache.org Fri Nov 26 07:54:52 2010 Return-Path: Delivered-To: apmail-mahout-user-archive@www.apache.org Received: (qmail 23349 invoked from network); 26 Nov 2010 07:54:52 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 26 Nov 2010 07:54:52 -0000 Received: (qmail 91950 invoked by uid 500); 26 Nov 2010 07:54:52 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 91812 invoked by uid 500); 26 Nov 2010 07:54:51 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 91803 invoked by uid 99); 26 Nov 2010 07:54:50 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Nov 2010 07:54:50 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ssc.open@googlemail.com designates 209.85.161.42 as permitted sender) Received: from [209.85.161.42] (HELO mail-fx0-f42.google.com) (209.85.161.42) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Nov 2010 07:54:44 +0000 Received: by fxm11 with SMTP id 11so1355389fxm.1 for ; Thu, 25 Nov 2010 23:54:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from :user-agent:mime-version:to:subject:references:in-reply-to :content-type:content-transfer-encoding; bh=1/d++GTMNdTyYgbn7zPosUE538x54VkXU34OTnDIX5o=; b=gXgvhRk6H/kgZ7cIiDqJ30TRgENVaSW9u35Qlagx5C7QRLXP+iolsVeLFUM4p/ZGTR e6SjUXeGX2c5PsTblBr3MQ7drZ8ecepJ3FApZEPbRvulqoWx0LRDFRcLyGZJAoeu7/wT ZB2G+rlGPGQVL1wQXX5f7xn2rSxL9w7IMZ0go= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; b=jOiEwqIAIKRH4rwYuZ8NrX9pmCSp4DDbf+6YBRDRFe31Nyih63YkxhWCg0RLggnt9t OcQ5eUTDWVK3af+/JSsm2Wvcq07+VwqesP737gsnhWc2sw4Upt9LmMIe6QyYR5PS6IIA TDi74WJCYAYd9OMyl8Dni++LHx76kwyvGkL9U= Received: by 10.223.108.147 with SMTP id f19mr1729343fap.68.1290758063045; Thu, 25 Nov 2010 23:54:23 -0800 (PST) Received: from [192.168.0.100] (f052141038.adsl.alicedsl.de [78.52.141.38]) by mx.google.com with ESMTPS id 17sm388171far.43.2010.11.25.23.54.21 (version=SSLv3 cipher=RC4-MD5); Thu, 25 Nov 2010 23:54:22 -0800 (PST) Message-ID: <4CEF67AC.3010307@googlemail.com> Date: Fri, 26 Nov 2010 08:54:20 +0100 From: Sebastian Schelter User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.15) Gecko/20101027 Thunderbird/3.0.10 MIME-Version: 1.0 To: user@mahout.apache.org Subject: Re: error in itemsimilarity References: <000001cb8d3e$8e930680$abb91380$@com.sg> In-Reply-To: <000001cb8d3e$8e930680$abb91380$@com.sg> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit ItemSimilarityJob can not be used to compute the similarity between text documents. It's thought to be used for Collaborative Filtering as described here: https://cwiki.apache.org/confluence/display/MAHOUT/Itembased+Collaborative+Filtering Am 26.11.2010 08:50, schrieb Divya: > Hi, > > I am getting following exception when I try to run itemsimilarity from CL. > > My input data is a text file which just has one line of text > > Can any one please help me in resolving the error. > > > > > > $ bin/mahout itemsimilarity -i D:/MahoutResult/ItemSimilarity/Input_Data -o > D:/MahoutResult/ItemSimilarity/Output -s DistributedUncen > > teredCosineVectorSimilarity.class > > Running on hadoop, using HADOOP_HOME=C:\cygwin\home\Divya\hadoop-0.20.2 > > HADOOP_CONF_DIR=C:\cygwin\home\Divya\hadoop-0.20.2\conf > > 10/11/26 15:43:50 INFO common.AbstractJob: Command line arguments: > {--booleanData=false, --endPhase=2147483647, --input=D:/MahoutResult > > /ItemSimilarity/Input_Data, --maxCooccurrencesPerItem=100, > --maxSimilaritiesPerItem=100, --output=D:/MahoutResult/ItemSimilarity/Output > > , --similarityClassname=DistributedUncenteredCosineVectorSimilarity.class, > --startPhase=0, --tempDir=temp} > > 10/11/26 15:43:51 INFO jvm.JvmMetrics: Initializing JVM Metrics with > processName=JobTracker, sessionId= > > 10/11/26 15:43:52 INFO input.FileInputFormat: Total input paths to process : > 2 > > 10/11/26 15:43:53 INFO mapred.JobClient: Running job: job_local_0001 > > 10/11/26 15:43:53 INFO input.FileInputFormat: Total input paths to process : > 2 > > 10/11/26 15:43:53 INFO mapred.MapTask: io.sort.mb = 100 > > 10/11/26 15:43:53 INFO mapred.MapTask: data buffer = 79691776/99614720 > > 10/11/26 15:43:53 INFO mapred.MapTask: record buffer = 262144/327680 > > 10/11/26 15:43:53 WARN mapred.LocalJobRunner: job_local_0001 > > java.lang.ArrayIndexOutOfBoundsException: 1 > > at > org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapp > er.java:47) > > at > org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapp > er.java:31) > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) > > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) > > 10/11/26 15:43:54 INFO mapred.JobClient: map 0% reduce 0% > > 10/11/26 15:43:54 INFO mapred.JobClient: Job complete: job_local_0001 > > 10/11/26 15:43:54 INFO mapred.JobClient: Counters: 0 > > 10/11/26 15:43:54 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with > processName=JobTracker, sessionId= - already initialized > > 10/11/26 15:43:55 INFO input.FileInputFormat: Total input paths to process : > 2 > > 10/11/26 15:43:55 INFO mapred.JobClient: Running job: job_local_0002 > > 10/11/26 15:43:55 INFO input.FileInputFormat: Total input paths to process : > 2 > > 10/11/26 15:43:56 INFO mapred.MapTask: io.sort.mb = 100 > > 10/11/26 15:43:56 INFO mapred.MapTask: data buffer = 79691776/99614720 > > 10/11/26 15:43:56 INFO mapred.MapTask: record buffer = 262144/327680 > > 10/11/26 15:43:56 WARN mapred.LocalJobRunner: job_local_0002 > > java.lang.NumberFormatException: For input string: "For a young person who > is years and above and below years he may be employed in an > > industrial undertaking His employer however is required to notify " > > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:48 > ) > > at java.lang.Long.parseLong(Long.java:410) > > at java.lang.Long.parseLong(Long.java:468) > > at > org.apache.mahout.cf.taste.hadoop.similarity.item.CountUsersMapper.map(Count > UsersMapper.java:40) > > at > org.apache.mahout.cf.taste.hadoop.similarity.item.CountUsersMapper.map(Count > UsersMapper.java:31) > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) > > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) > > 10/11/26 15:43:56 INFO mapred.JobClient: map 0% reduce 0% > > 10/11/26 15:43:56 INFO mapred.JobClient: Job complete: job_local_0002 > > 10/11/26 15:43:56 INFO mapred.JobClient: Counters: 0 > > 10/11/26 15:43:56 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with > processName=JobTracker, sessionId= - already initialized > > 10/11/26 15:43:57 INFO input.FileInputFormat: Total input paths to process : > 2 > > 10/11/26 15:43:57 INFO mapred.JobClient: Running job: job_local_0003 > > 10/11/26 15:43:57 INFO input.FileInputFormat: Total input paths to process : > 2 > > 10/11/26 15:43:57 INFO mapred.MapTask: io.sort.mb = 100 > > 10/11/26 15:43:57 INFO mapred.MapTask: data buffer = 79691776/99614720 > > 10/11/26 15:43:57 INFO mapred.MapTask: record buffer = 262144/327680 > > 10/11/26 15:43:58 WARN mapred.LocalJobRunner: job_local_0003 > > java.lang.NumberFormatException: For input string: "For a young person who > is years and above and below years he may be employed in an > > industrial undertaking His employer however is required to notify " > > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:48 > ) > > at java.lang.Long.parseLong(Long.java:410) > > at java.lang.Long.parseLong(Long.java:468) > > at > org.apache.mahout.cf.taste.hadoop.ToEntityPrefsMapper.map(ToEntityPrefsMappe > r.java:57) > > at > org.apache.mahout.cf.taste.hadoop.ToEntityPrefsMapper.map(ToEntityPrefsMappe > r.java:30) > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) > > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) > > 10/11/26 15:43:58 INFO mapred.JobClient: map 0% reduce 0% > > 10/11/26 15:43:58 INFO mapred.JobClient: Job complete: job_local_0003 > > 10/11/26 15:43:58 INFO mapred.JobClient: Counters: 0 > > 10/11/26 15:43:58 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with > processName=JobTracker, sessionId= - already initialized > > 10/11/26 15:43:59 INFO input.FileInputFormat: Total input paths to process : > 0 > > 10/11/26 15:43:59 INFO mapred.LocalJobRunner: > > 10/11/26 15:43:59 INFO mapred.JobClient: Running job: job_local_0004 > > 10/11/26 15:43:59 INFO input.FileInputFormat: Total input paths to process : > 0 > > 10/11/26 15:43:59 WARN mapred.LocalJobRunner: job_local_0004 > > java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 > > at java.util.ArrayList.RangeCheck(ArrayList.java:547) > > at java.util.ArrayList.get(ArrayList.java:322) > > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:124) > > 10/11/26 15:44:00 INFO mapred.JobClient: map 0% reduce 0% > > 10/11/26 15:44:00 INFO mapred.JobClient: Job complete: job_local_0004 > > 10/11/26 15:44:00 INFO mapred.JobClient: Counters: 0 > > Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0 > > at > org.apache.mahout.cf.taste.hadoop.TasteHadoopUtils.readIntFromFile(TasteHado > opUtils.java:103) > > at > org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob.run(Item > SimilarityJob.java:187) > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) > > at > org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob.main(Ite > mSimilarityJob.java:92) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39 > ) > > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl > .java:25) > > at java.lang.reflect.Method.invoke(Method.java:597) > > at > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver > .java:68) > > at > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) > > at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:184) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39 > ) > > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl > .java:25) > > at java.lang.reflect.Method.invoke(Method.java:597) > > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > > > > > > Thanks > > Regards, > > Divya > >