mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: Creating vectors from lucene index on EMR via the CLI
Date Wed, 12 Dec 2012 13:00:22 GMT
I just mean I'm not familiar with the particular code you are running,
but, think the problem is to do with calling elastic-mapreduce in
general, which has nothing to do with the JAR itself. Indeed there's
nothing that indicates a problem with the JAR file. As I said on your
other message, I think you need "--arg --foo=bar" not "--arg --foo
bar".

utils has become integration but this is unrelated.

On Wed, Dec 12, 2012 at 12:37 PM, hellen maziku <nahelna@yahoo.com> wrote:
> Also, what do you mean by " don't know much about this particular job", does the type
of the job jar file matter? I thought as long as I could locate the org.apache.mahout.utils.vectors.lucene.Driver
class then I was good to use that job jar file.
>
> Btw, whenever I installed and compiled mahout 0.7 (both from sorce and trunk), I couldnot
locate the mahout-utils-0.7 jar. Why is this so?
>
> Thank you again.
>
>
>
> ________________________________
>  From: Sean Owen <srowen@gmail.com>
> To: Mahout User List <user@mahout.apache.org>; hellen maziku <nahelna@yahoo.com>
> Sent: Wednesday, December 12, 2012 6:05 AM
> Subject: Re: Creating vectors from lucene index on EMR via the CLI
>
> I don't know much about this particular job, but the general problem
> here is that you are passing arguments to a binary called
> elastic-mapreduce, and not to the Java program. There is likely some
> mechanism to package up arguments that need to be sent to the program,
> as an argument to the elastic-mapreduce binary.
>
> On Wed, Dec 12, 2012 at 11:55 AM, hellen maziku <nahelna@yahoo.com> wrote:
>> Hi,
>> I installed mahout and solr.
>>
>> I created an index from the dictionary.txt using the command below
>>
>> curl "http://localhost:8983/solr/update/extract?literal.id=doc1&commit=true"
-F "myfile=@dictionary.txt"
>>
>> To create the vectors from my index
>>
>> I needed the org.apache.mahout.utils.vectors.lucene.Driver class. I
>> couldnot locate this class in mahout-core-o.7-job.jar. I could only
>> locate it from mahout-examples-0.7-job.jar, so I uploaded the
>> mahout-examples-0.7-job.jar on an s3 bucket.
>>
>> I also uploaded the dictionary index on a separete s3 bucket. I created
>> another bucket with two folders to store my dictOut and vectors.
>>
>> I created a job flow on the CLI
>>
>> /elastic-mapreduce --create --alive    --log-uri s3n://mahout-output/logs/  --name
dict_vectorize
>>
>> I added the step to vectorize my index using the following command
>> ./elastic-mapreduce -j j-2NSJRI6N9EQJ4  --jar
>> s3n://mahout-bucket/jars/mahout-examples-0.7-job.jar  --main-class
>> org.apache.mahout.utils.vectors.lucene.Driver --arg --dir
>> s3n://mahout-input/input1/index/ --arg --field doc1 --arg --dictOut
>> s3n://mahout-output/solr-dict-out/dict.txt --arg --output
>> s3n://mahout-output/solr-vect-out/vectors
>>
>>
>> But in the logs I get the following error
>>
>> 2012-12-12 09:37:17,883 ERROR org.apache.mahout.utils.vectors.lucene.Driver (main):
Exception
>> org.apache.commons.cli2.OptionException: Missing value(s) --dir
>>     at org.apache.commons.cli2.option.ArgumentImpl.validate(ArgumentImpl.java:241)
>>     at org.apache.commons.cli2.option.ParentImpl.validate(ParentImpl.java:124)
>>     at org.apache.commons.cli2.option.DefaultOption.validate(DefaultOption.java:176)
>>     at org.apache.commons.cli2.option.GroupImpl.validate(GroupImpl.java:265)
>>     at org.apache.commons.cli2.commandline.Parser.parse(Parser.java:104)
>>     at
>>  org.apache.mahout.utils.vectors.lucene.Driver.main(Driver.java:197)
>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>     at java.lang.reflect.Method.invoke(Method.java:597)
>>     at org.apache.hadoop.util.RunJar.main(RunJar.java:187)
>>
>>
>> What am I doing wrong?
>> Another question: what is the correct value of the --field argument, is it doc1 (the
id) or dictionary(from the filename dictionary.txt). I am asking
>> this becasue when I issue the querry with q=doc1 on solr I get no
>> results. But when I issue the query with q=dictionary, I see my content.
>>
>> Thank you so much for help. I am a newbie, so please excuse my being too verbal.

Mime
View raw message