mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris McConnell <c.t.mcconnell...@gmail.com>
Subject Re: LDA from Lucene Indexes
Date Tue, 03 May 2011 00:50:04 GMT
Great question, let me check on that. Sadly I don't have fast control
over the indexing process, but I'll post an update in the AM.

Thanks for the tip.

Chris

On Mon, May 2, 2011 at 6:36 PM, Jake Mannix <jake.mannix@gmail.com> wrote:
> Were your lucene indexes created with term vectors enabled?
>
> On May 2, 2011 3:05 PM, "Chris McConnell" <c.t.mcconnell.ge@gmail.com>
> wrote:
>
> Hello all,
>
> We are looking at utilizing LDA for some topic trending off some
> pre-built Lucene indexes. I've put the command(s) and output below.
> While searching, it seems a lot of people are unable to get this to
> work properly. Most answers tell the user to review the example
> "build-reuters.sh" but that doesn't utilize a Lucene index for the
> input.
>
> The dictionary is created (on local disk) and an attempt at vector
> creation is done on HDFS, however no vectors are written out. I'm
> interested to know if anyone has actually gotten this to work on
> Mahout 0.4. I have (just for testing purposes) then tried to run the
> actual LDA on the created directories, however I wouldn't expect it to
> work since there are no vectors created.
>
> Thanks,
> Chris
>
> bin/mahout lucene.vector --dir /home/index_for_mahout/ --output
> /user/vectored_lucene_index --dictOut
> /home/vectored_lucene_index/dict.out --weight TF --field content
> 11/05/02 17:23:57 INFO lucene.Driver: Output File:
> /user/vectored_lucene_index
> 11/05/02 17:23:57 INFO util.NativeCodeLoader: Loaded the native-hadoop
> library
> 11/05/02 17:23:57 INFO zlib.ZlibFactory: Successfully loaded &
> initialized native-zlib library
> 11/05/02 17:23:57 INFO compress.CodecPool: Got brand-new compressor
> 11/05/02 17:23:58 INFO lucene.Driver: Wrote: 0 vectors
> 11/05/02 17:23:58 INFO lucene.Driver: Dictionary Output file:
> /home/vectored_lucene_index/dict.out
> 11/05/02 17:23:58 INFO driver.MahoutDriver: Program took 578 ms
>

Mime
View raw message