Return-Path: Delivered-To: apmail-lucene-mahout-user-archive@minotaur.apache.org Received: (qmail 33387 invoked from network); 22 Oct 2009 08:55:30 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 22 Oct 2009 08:55:30 -0000 Received: (qmail 22797 invoked by uid 500); 22 Oct 2009 08:55:27 -0000 Delivered-To: apmail-lucene-mahout-user-archive@lucene.apache.org Received: (qmail 22782 invoked by uid 500); 22 Oct 2009 08:55:26 -0000 Mailing-List: contact mahout-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mahout-user@lucene.apache.org Delivered-To: mailing list mahout-user@lucene.apache.org Received: (qmail 22768 invoked by uid 99); 22 Oct 2009 08:55:25 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Oct 2009 08:55:25 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of feng2211@gmail.com designates 74.125.92.27 as permitted sender) Received: from [74.125.92.27] (HELO qw-out-2122.google.com) (74.125.92.27) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Oct 2009 08:55:15 +0000 Received: by qw-out-2122.google.com with SMTP id 5so977657qwd.53 for ; Thu, 22 Oct 2009 01:54:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=zLcH3FsexrZuLPV1KJhhFPTYDyYo3L7WPJlKaE6QTog=; b=rAmo4L29iss+sCFErCgKm0GG5KHvRcUQp4omSUW+InWC223Z+TJhuXlSaHQyZwtv+O 5HzHrG61qNaH8TZv5VUU6DTlsjI3r7BJpr9TybdtqWp8Zrh1KvwgMbWC2spjP6tJrdnH VPOdiHq+UFeJSRaHlG2OcU++5NVBPXuTEEC40= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=vNBbhm433mJ7Nr36td5o1mpyjn9W1X0HlzuxX410j/jCJMdSOtM4IhV9940CgTP+zN qK22AjbyjpVVqKNIHAQhNMQijJ07rxEmgMdZ9Z/ZANTuRPSXCVXQaAu7BiD9mVaw7Bmv eaMfEia//4YzCNRZReU1zYitl7FTFAZMxbMGE= MIME-Version: 1.0 Received: by 10.229.15.1 with SMTP id i1mr1265855qca.30.1256201694062; Thu, 22 Oct 2009 01:54:54 -0700 (PDT) In-Reply-To: References: <9dfff7090910210104k7797f9c0ufb56bc7a101a33b1@mail.gmail.com> <4E624EF1-6612-4384-8E7B-A9ED98FBFE4E@apache.org> <9dfff7090910210652o776880d8m827ce64daa4538e7@mail.gmail.com> Date: Thu, 22 Oct 2009 16:54:54 +0800 Message-ID: <9dfff7090910220154n149e27dfjd733e2c73f677958@mail.gmail.com> Subject: Re: i have met a problem when i do "Creating Vectors from Text" From: =?GB2312?B?1ty35Q==?= To: mahout-user@lucene.apache.org Content-Type: multipart/alternative; boundary=0015175770eabdaa2c0476823f33 X-Virus-Checked: Checked by ClamAV on apache.org --0015175770eabdaa2c0476823f33 Content-Type: text/plain; charset=GB2312 Content-Transfer-Encoding: quoted-printable Thank you.you have reminded me to store term vectors.The class "org.apache.lucene.demo.IndexFiles " (in http://lucene.apache.org/java/2_9_0/demo.html) which i used to create inde= x files does not store term vectors. Now i can run the example of kmeans successfully.I have another question.When i use the class "org.apache.mahout.utils.vectors.lucene.Driver" to create vectors from inde= x files,this class can convert a field of index to an output file,and then th= e KMeansDriver can run based on the output file.But in my application,i want the kmeans to compute based on multi-field .Because in my application,i use multi-field to describe one object. How do i achieve my goal? Thanks 2009/10/22 Grant Ingersoll > Do you have Term Vectors stored? > > > On Oct 21, 2009, at 9:52 AM, =D6=DC=B7=E5 wrote: > > yes.i have solved the problem.These jarfiles must be added in classpath. >> root@master:/home/zhoufeng/mahout/trunk/utils/target/dependency# java -c= p >> >> /home/zhoufeng/mahout/trunk/utils/target/mahout-utils-0.2-SNAPSHOT.jar:l= ucene-core-2.9.0.jar:slf4j-api-1.5.8.jar:slf4j-jcl-1.5.8.jar:commons-loggin= g-1.1.1.jar:commons-cli-2.0-mahout.jar:/home/zhoufeng/mahout/trunk/core/tar= get/mahout-core-0.2-SNAPSHOT.jar:hadoop-core-0.20.1.jar >> org.apache.mahout.utils.vectors.lucene.Driver --dir >> /home/zhoufeng/newdisk/newindex/ --field string -t >> /home/zhoufeng/newdisk/di >> ct.txt --output /home/zhoufeng/newdisk/out.txt --max 50 >> >> 2009-10-21 21:44:39 org.slf4j.impl.JCLLoggerAdapter info >> Output File: /home/zhoufeng/newdisk/out.txt >> 2009-10-21 21:44:40 org.apache.hadoop.util.NativeCodeLoader >> Unable to load native-hadoop library for your platform... using >> builtin-java classes where applicable >> 2009-10-21 21:44:40 org.apache.hadoop.io.compress.CodecPool getCompresso= r >> Got brand-new compressor >> Exception in thread "main" java.lang.NullPointerException >> at >> >> org.apache.mahout.utils.vectors.lucene.LuceneIterable$TDIterator.next(Lu= ceneIterable.java:110) >> at >> >> org.apache.mahout.utils.vectors.lucene.LuceneIterable$TDIterator.next(Lu= ceneIterable.java:81) >> at >> >> org.apache.mahout.utils.vectors.io.SequenceFileVectorWriter.write(Sequen= ceFileVectorWriter.java:40) >> at >> org.apache.mahout.utils.vectors.lucene.Driver.main(Driver.java:200) >> >> Does this exception show that there are some problems in my index files? >> >> i used the class "org.apache.lucene.demo.IndexFiles " (in >> http://lucene.apache.org/java/2_9_0/demo.html) to create my index >> files.And >> i have used the class "org.apache.lucene.demo.SearchFiles"(also in >> http://lucene.apache.org/java/2_9_0/demo.html) to search the index >> successfully. >> 2009/10/21 Grant Ingersoll >> >> Here's what I use to run it, as generated by IntelliJ ( >>> with your appropriate value) >>> : >>> java -Xmx1024M -Dfile.encoding=3DUTF-8 -classpath >>> >>> /projects/lucene/mahout/mahout-clean/utils/target/classes:/= projects/lucene/mahout/mahout-clean/core/target/classes:/.m2/reposito= ry/org/apache/mahout/hadoop/hadoop-core/0.20.1/hadoop-core-0.20.1.jar:/.m2/repository/org/apache/mahout/hbase/hbase/0.20.0/hbase-0.20.0.jar:/.m2/repository/org/apache/mahout/kosmofs/kfs/0.3/kfs-0.3.jar:/.m2/= repository/org/apache/mahout/jets3t/jets3t/0.7.1/jets3t-0.7.1.jar:/.m= 2/repository/xmlenc/xmlenc/0.52/xmlenc-0.52.jar:/.m2/repository/commo= ns-logging/commons-logging/1.1.1/commons-logging-1.1.1.jar:/.m2/repos= itory/commons-httpclient/commons-httpclient/3.1/commons-httpclient-3.1.jar:= /.m2/repository/commons-codec/commons-codec/1.2/commons-codec-1.2.jar= :/.m2/repository/commons-dbcp/commons-dbcp/1.2.2/commons-dbcp-1.2.2.j= ar:/.m2/repository/commons-pool/commons-pool/1.4/commons-pool-1.4.jar= :/.m2/repository/log4j/log4j/1.2.15/log4j-1.2.15.jar:/.m2/repos= itory/javax/mail/mail/1.4/mail-1.4.jar:/.m2/repository/javax/activati= on/activation/1.1/activation-1.1.jar:/.m2/repository/org/slf4j/slf4j-= api/1.5.8/slf4j-api-1.5.8.jar:/.m2/repository/org/slf4j/slf4j-jcl/1.5= .8/slf4j-jcl-1.5.8.jar:/.m2/repository/commons-lang/commons-lang/2.4/= commons-lang-2.4.jar:/.m2/repository/org/apache/mahout/watchmaker/wat= chmaker-framework/0.6.2/watchmaker-framework-0.6.2.jar:/.m2/repositor= y/org/apache/mahout/watchmaker/watchmaker-swing/0.6.2/watchmaker-swing-0.6.= 2.jar:/.m2/repository/org/apache/mahout/uncommons/math/uncommons-math= /1.2/uncommons-math-1.2.jar:/.m2/repository/com/thoughtworks/xstream/= xstream/1.2.1/xstream-1.2.1.jar:/.m2/repository/xpp3/xpp3_min/1.1.3.4= .O/xpp3_min-1.1.3.4.O.jar:/.m2/repository/org/apache/lucene/lucene-an= alyzers/2.9.0/lucene-analyzers-2.9.0.jar:/.m2/repository/org/apache/l= ucene/lucene-core/2.9.0/lucene-core-2.9.0.jar:/.m2/repository/org/apa= che/mahout/commons/commons-cli/2.0-mahout/commons-cli-2.0-mahout.jar:= /.m2/repository/commons-math/commons-math/1.2/commons-math-1.2.jar:/.= m2/repository/junit/junit/3.8.2/junit-3.8.2.jar:/.m2/repository/org/e= asymock/easymockclassextension/2.2/easymockclassextension-2.2.jar:/.m= 2/repository/org/easymock/easymock/2.2/easymock-2.2.jar:/.m2/reposito= ry/cglib/cglib-nodep/2.1_3/cglib-nodep-2.1_3.jar:/.m2/repository/com/= google/code/gson/gson/1.3/gson-1.3.jar:/.m2/repository/org/easymock/e= asymock/2.4/easymock-2.4.jar:/.m2/repository/org/easymock/easymockcla= ssextension/2.4/easymockclassextension-2.4.jar:/.m2/repository/cglib/= cglib/2.1_3/cglib-2.1_3.jar:/.m2/repository/asm/asm/1.5.3/asm-1.5.3.j= ar >>> org.apache.mahout.utils.vectors.lucene.Driver --dir >>> /projects/lucene/solr/wikipedia/solr/data/index --field body -t >>> /projects/lucene/solr/wikipedia/dict.txt --output >>> /projects/lucene/solr/wikipedia/part-50.txt --max 50 >>> >>> One way to quickly get all of the dependencies in a single directory fo= r >>> inclusion on the command line is via Maven's copy-dependencies goal: m= vn >>> dependency:copy-dependencies >>> >>> This will download all the dependencies under a subdir of the target di= r. >>> >>> >>> On Oct 21, 2009, at 4:04 AM, =D6=DC=B7=E5 wrote: >>> >>> At first,i have built a Lucene index in my directory >>> >>>> "/home/zhoufeng/newdisk/newindex",then i want to create Vectors from t= he >>>> index files. >>>> then i met a problem >>>> root@master:/home/zhoufeng/mahout/trunk/utils/target# java -cp >>>> >>>> >>>> mahout-utils-0.2-SNAPSHOT.jar:/home/zhoufeng/mahout/trunk/core/target/= mahout-core-0.2-SNAPSHOT.jar >>>> org.apache.mahout.utils.vectors.lucene.Driver --dir >>>> /home/zhoufeng/newdisk/newindex string --dictOut >>>> /home/zhoufeng/newdisk/newindex/dict.txt --output >>>> /home/zhoufeng/newdisk/newindex/out.txt -max 50 >>>> Exception in thread "main" java.lang.NoClassDefFoundError: >>>> org/apache/commons/cli2/OptionException >>>> Caused by: java.lang.ClassNotFoundException: >>>> org.apache.commons.cli2.OptionException >>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:200) >>>> at java.security.AccessController.doPrivileged(Native Method) >>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:188) >>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:307) >>>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) >>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:252) >>>> at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320) >>>> Could not find the main class: >>>> org.apache.mahout.utils.vectors.lucene.Driver. Program will exit. >>>> >>>> i do not know where is the java file >>>> "org.apache.commons.cli2.OptionException". >>>> Is It because some jar file is absent? >>>> >>>> can anyone help me? thanks >>>> >>>> >>> -------------------------- >>> Grant Ingersoll >>> http://www.lucidimagination.com/ >>> >>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) usin= g >>> Solr/Lucene: >>> http://www.lucidimagination.com/search >>> >>> >>> > -------------------------- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using > Solr/Lucene: > http://www.lucidimagination.com/search > > --0015175770eabdaa2c0476823f33--