mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lewis John Mcgibbney <lewis.mcgibb...@gmail.com>
Subject Re: Building TFIDF Vectors from Solr Index
Date Fri, 09 Aug 2013 21:17:51 GMT
Hi All,
I've changed the text field in shema.xml to

<field name="text" type="text_general" stored="true" indexed="true"
termVectors="true" />
(thank you Roland)

then reindexed some documents.

i then ran

./bin/mahout lucene.vector --dir
"../solr-4.3.1/example/multicore/e001/data/index/" --idField id --output
"../solr-4.3.1/example/multicore/e001/vector/" --field text --dictOut
"../solr-4.3.1/example/multicore/e001/dictionary/dic.txt" --weight TFIDF

And it all works sweet as a Nut.
Thanks
Lewis


On Fri, Aug 9, 2013 at 12:51 PM, Lewis John Mcgibbney <
lewis.mcgibbney@gmail.com> wrote:

> Hi,
> I've been working with Mahout trunk and attempting to build the above from
> a Solr 4.3.1 index as follows. I am using Hadoop 1.1.1 to do the processing.
>
> h@CEE279Law3-Linux:~/Downloads/asf/mahout$ ./bin/mahout lucene.vector
> --dir "../solr-4.3.1/example/multicore/e001/data/index/" --idField id
> --output "../solr-4.3.1/example/multicore/e001/vector" --field content
> --dictOut
> "file:/home/law/Downloads/asf/solr-4.3.1/example/multicore/e001/dictionary"
> --weight TFIDF
> Warning: $HADOOP_HOME is deprecated.
>
> Running on hadoop, using /home/law/Downloads/asf/hadoop-1.1.1/bin/hadoop
> and HADOOP_CONF_DIR=
> MAHOUT-JOB:
> /home/law/Downloads/asf/mahout/examples/target/mahout-examples-0.9-SNAPSHOT-job.jar
> Warning: $HADOOP_HOME is deprecated.
>
> Exception in thread "main" java.lang.NullPointerException
>     at
> org.apache.mahout.utils.vectors.lucene.CachedTermInfo.<init>(CachedTermInfo.java:45)
>     at
> org.apache.mahout.utils.vectors.lucene.Driver.dumpVectors(Driver.java:102)
>     at org.apache.mahout.utils.vectors.lucene.Driver.main(Driver.java:290)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:606)
>     at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>     at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>     at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:606)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>
> The content field in the schema.xml looks like this
>
> <field name="content" type="text_general" stored="true" indexed="true"/>
>
> which I think/hope must be the root of the problem. Can someone advise if
> I need to add more configuration to this field for vecotrs to be built?
>
> Thank you v much in advance.
> Best
> Lewis
>
>
> --
> *Lewis*
>



-- 
*Lewis*

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message