mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <dlie...@gmail.com>
Subject Re: Latent Semantic Analysis
Date Fri, 06 Apr 2012 20:05:03 GMT
Ok, cool.

I think writing MR output into your input folder is not a good
practice in general in Hadoop world regardless of a job. Glad you had
it resolved.

On Fri, Apr 6, 2012 at 9:55 AM, Peyman Mohajerian <mohajeri@gmail.com> wrote:
> Dmitriy,
>
> I did downgrade my hadoop and got the same error; however your last
> suggestion worked, I moved the output path to a whole different directory
> and this particular problem went away.
>
> Thanks Much,
> Peyman
>
> On Thu, Apr 5, 2012 at 12:38 PM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:
>
>> also i notice that you are using output as a subfolder of your input?
>> if so, it is probably going to create some mess. If so, please don't
>> use folders for input and output spec which are nested w.r.t. each
>> other. This is not expected.
>>
>> -d
>>
>> On Thu, Apr 5, 2012 at 12:00 PM, Peyman Mohajerian <mohajeri@gmail.com>
>> wrote:
>> > Ok, great, I'll give these ideas a try later today, the input is the
>> > following line(s) that in my code sample was commented out using ';' in
>> > Clojure.
>> >  The first stage, Q-job is done fine, it is the second job that gets
>> messed
>> > up, the output of Q-job is at:
>> > /lsa4solr/matrix/14099700861483/transpose-213/SSVD-out/Q-job and
>> > /lsa4solr/matrix/14099700861483/transpose-213/SSVD-out/Q-job but BtJob is
>> > looking for the input in the wrong place, it must be hadoop version as
>> you
>> > said.
>> >
>> > input path  #<Path
>> > hdfs://localhost:9000/lsa4solr/matrix/15835804941333/transpose-120>
>> > dd  #<Path[] [Lorg.apache.hadoop.fs.Path;@5563d208>
>> > numCol  1000
>> > numrow  15982
>> >
>> >
>> > On Thu, Apr 5, 2012 at 11:54 AM, Dmitriy Lyubimov <dlieu.7@gmail.com>
>> wrote:
>> >
>> >> Another idea i have is to try to run it from just Mahout command line,
>> >> see if it works with .205. If it does, it is definitely something
>> >> about passing parameters in/client hadoop classpath/ etc.
>> >>
>> >> On Thu, Apr 5, 2012 at 11:51 AM, Dmitriy Lyubimov <dlieu.7@gmail.com>
>> >> wrote:
>> >> > also you are printing your input path -- how does it look like in
>> >> > reality? because this path that it complains about, SSVDOutput/data,
>> >> > in fact should be the input path. That's what's perplexing.
>> >> >
>> >> > We are talking hadoop job setup process here, nothing specific to the
>> >> > solution itself. And job setup/directory management fails for some
>> >> > reason.
>> >> >
>> >> > On Thu, Apr 5, 2012 at 11:45 AM, Dmitriy Lyubimov <dlieu.7@gmail.com>
>> >> wrote:
>> >> >> Any chance you could test it with its current dependency, 0.20.204?
>> or
>> >> >> that would be hard to stage?
>> >> >>
>> >> >> Newer hadoop version is frankly all i can think of here for the
>> reason
>> >> of this.
>> >> >>
>> >> >> On Thu, Apr 5, 2012 at 11:35 AM, Peyman Mohajerian <
>> mohajeri@gmail.com>
>> >> wrote:
>> >> >>> Hi Dmitriy,
>> >> >>>
>> >> >>> It is a Clojure code from: https://github.com/algoriffic/lsa4solr
>> >> >>> Of course I modified it to use Mahout .6 distribution, also
running
>> on
>> >> >>> hadoop-0.20.205.0, here is the Closure code that I changed,
>> >> >>> the lines after ' decomposer (doto (.run ssvdSolver)) ' still
need
>> >> >>> modification b/c I'm not reading the eigenValue/Vector from
the
>> solver
>> >> >>> correctly.  Originally this code was based on Mahout .4. I'm
>> creating
>> >> the
>> >> >>> Matrix from Solr 3.1.0, very similar to what was done on: '
>> >> >>> https://github.com/algoriffic/lsa4solr'
>> >> >>>
>> >> >>> Thanks,
>> >> >>>
>> >> >>> (defn decompose-svd
>> >> >>>  [mat k]
>> >> >>>  ;(println "input path " (.getRowPath mat))
>> >> >>>  ;(println "dd " (into-array [(.getRowPath mat)]))
>> >> >>>  ;(println "numCol " (.numCols mat))
>> >> >>>  ;(println "numrow " (.numRows mat))
>> >> >>>  (let [eigenvalues (new java.util.ArrayList)
>> >> >>>    eigenvectors (DenseMatrix. (+ k 2) (.numCols mat))
>> >> >>>    numCol (.numCols mat)
>> >> >>>        config (.getConf mat)
>> >> >>>    rawPath (.getRowPath mat)
>> >> >>>    outputPath (Path. (str (.toString rawPath) "/SSVD-out"))
>> >> >>>    inputPath (into-array [rawPath])
>> >> >>>    ssvdSolver (SSVDSolver. config inputPath outputPath 1000
k 60 3)
>> >> >>>    decomposer (doto (.run ssvdSolver))
>> >> >>>    V (normalize-matrix-columns (.viewPart (.transpose eigenvectors)
>> >> >>>                           (int-array [0 0])
>> >> >>>                           (int-array [(.numCols
mat) k])))
>> >> >>>    U (mmult mat V)
>> >> >>>    S (diag (take k (reverse eigenvalues)))]
>> >> >>>    {:U U
>> >> >>>     :S S
>> >> >>>     :V V}))
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> On Thu, Apr 5, 2012 at 11:10 AM, Dmitriy Lyubimov <
>> dlieu.7@gmail.com>
>> >> wrote:
>> >> >>>
>> >> >>>> Yeah. i don't see how it may have arrived at that error.
>> >> >>>>
>> >> >>>>
>> >> >>>> Peyman,
>> >> >>>>
>> >> >>>> I need to know more -- it looks like you are using embedded
api,
>> not a
>> >> >>>> command line, so i need to see how you you initialize the
solver
>> and
>> >> >>>> also which version of Mahout libraries you are using (your
stack
>> trace
>> >> >>>> numbers do not correspond to anything reasonable on current
trunk).
>> >> >>>>
>> >> >>>> thanks.
>> >> >>>>
>> >> >>>> -d
>> >> >>>>
>> >> >>>> On Thu, Apr 5, 2012 at 10:55 AM, Dmitriy Lyubimov <
>> dlieu.7@gmail.com>
>> >> >>>> wrote:
>> >> >>>> > Hm. i never saw that and not sure where this folder
comes from.
>> >> Which
>> >> >>>> > hadoop version are you using? This may be a result
of
>> incompatible
>> >> >>>> > support for multiple outputs in the newer hadoop versions
. I
>> tested
>> >> >>>> > it with CDH3u0/u3 and it was fine. This folder should
normally
>> >> appear
>> >> >>>> > in the conversation, i suspect it is an internal hadoop
thing.
>> >> >>>> >
>> >> >>>> > This is without me actually looking at the code per
stack trace.
>> >> >>>> >
>> >> >>>> >
>> >> >>>> > On Thu, Apr 5, 2012 at 5:22 AM, Peyman Mohajerian
<
>> >> mohajeri@gmail.com>
>> >> >>>> wrote:
>> >> >>>> >> Hi Guys,
>> >> >>>> >> I'm now using ssvd for my LSA code and get the
following error,
>> at
>> >> the
>> >> >>>> time
>> >> >>>> >> of error all I have under 'SSVD-out' folder:
>> >> >>>> >> Q-job/QHat-m-00000<
>> >> >>>>
>> >>
>> http://localhost:50075/browseDirectory.jsp?dir=%2Flsa4solr%2Fmatrix%2F14099700861483%2Ftranspose-213%2FSSVD-out%2FQ-job%2FQHat-m-00000&namenodeInfoPort=50070
>> >> >>>> >&
>> >> >>>> >> R-m-00000<
>> >> >>>>
>> >>
>> http://localhost:50075/browseDirectory.jsp?dir=%2Flsa4solr%2Fmatrix%2F14099700861483%2Ftranspose-213%2FSSVD-out%2FQ-job%2FR-m-00000&namenodeInfoPort=50070
>> >> >>>> >&
>> >> >>>> >> _SUCCESS<
>> >> >>>>
>> >>
>> http://localhost:50075/browseDirectory.jsp?dir=%2Flsa4solr%2Fmatrix%2F14099700861483%2Ftranspose-213%2FSSVD-out%2FQ-job%2F_SUCCESS&namenodeInfoPort=50070
>> >> >>>> >&
>> >> >>>> >> part-m-00000.deflate<
>> >> >>>>
>> >>
>> http://localhost:50075/browseDirectory.jsp?dir=%2Flsa4solr%2Fmatrix%2F14099700861483%2Ftranspose-213%2FSSVD-out%2FQ-job%2Fpart-m-00000.deflate&namenodeInfoPort=50070
>> >> >>>> >
>> >> >>>> >>
>> >> >>>> >> I'm not clear where '/data' folder is supposed
to be set, is it
>> >> part of
>> >> >>>> the
>> >> >>>> >> output of the QJob, I don't see any error in the
QJob*?
>> >> >>>> >>
>> >> >>>> >> *Thanks,*
>> >> >>>> >> *
>> >> >>>> >> SEVERE: java.io.FileNotFoundException: File does
not exist:
>> >> >>>> >>
>> >> >>>>
>> >>
>> hdfs://localhost:9000/lsa4solr/matrix/15835804941333/transpose-120/SSVD-out/data
>> >> >>>> >>    at
>> >> >>>> >>
>> >> >>>>
>> >>
>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:534)
>> >> >>>> >>    at
>> >> >>>> >>
>> >> >>>>
>> >>
>> org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:63)
>> >> >>>> >>    at
>> >> >>>> >>
>> >> >>>>
>> >>
>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)
>> >> >>>> >>    at
>> >> >>>>
>> org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:954)
>> >> >>>> >>    at
>> >> org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:971)
>> >> >>>> >>    at
>> >> org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:172)
>> >> >>>> >>    at
>> org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:889)
>> >> >>>> >>    at
>> org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:842)
>> >> >>>> >>    at java.security.AccessController.doPrivileged(Native
Method)
>> >> >>>> >>    at javax.security.auth.Subject.doAs(Subject.java:396)
>> >> >>>> >>    at
>> >> >>>> >>
>> >> >>>>
>> >>
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>> >> >>>> >>    at
>> >> >>>> >>
>> >> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:842)
>> >> >>>> >>    at org.apache.hadoop.mapreduce.Job.submit(Job.java:465)
>> >> >>>> >>    at
>> >> >>>>
>> org.apache.mahout.math.hadoop.stochasticsvd.BtJob.run(BtJob.java:505)
>> >> >>>> >>    at
>> >> >>>> >>
>> >> >>>>
>> >>
>> org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.run(SSVDSolver.java:347)
>> >> >>>> >>    at
>> >> lsa4solr.mahout_matrix$decompose_svd.invoke(mahout_matrix.clj:188)
>> >> >>>> >>    at
>> >> >>>> >>
>> >> >>>>
>> >>
>> lsa4solr.clustering_protocol$decompose_term_doc_matrix.invoke(clustering_protocol.clj:125)
>> >> >>>> >>    at
>> >> >>>> >>
>> >> >>>>
>> >>
>> lsa4solr.clustering_protocol$cluster_kmeans_docs.invoke(clustering_protocol.clj:142)
>> >> >>>> >>    at lsa4solr.cluster$cluster_dispatch.invoke(cluster.clj:72)
>> >> >>>> >>    at lsa4solr.cluster$_cluster.invoke(cluster.clj:103)
>> >> >>>> >>    at lsa4solr.cluster.LSAClusteringEngine.cluster(Unknown
>> Source)
>> >> >>>> >>    at
>> >> >>>> >>
>> >> >>>>
>> >>
>> org.apache.solr.handler.clustering.ClusteringComponent.process(ClusteringComponent.java:91)
>> >> >>>> >>    at
>> >> >>>> >>
>> >> >>>>
>> >>
>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
>> >> >>>> >>    at
>> >> >>>> >>
>> >> >>>>
>> >>
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>> >> >>>> >>    at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
>> >> >>>> >>    at
>> >> >>>> >>
>> >> >>>>
>> >>
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
>> >> >>>> >>    at
>> >> >>>> >>
>> >> >>>>
>> >>
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
>> >> >>>> >>    at
>> >> >>>> >>
>> >> >>>>
>> >>
>> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>> >> >>>> >>    at
>> >> >>>> >>
>> >> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
>> >> >>>> >>
>> >> >>>> >> On Sun, Feb 26, 2012 at 4:56 PM, Dmitriy Lyubimov
<
>> >> dlieu.7@gmail.com>
>> >> >>>> wrote:
>> >> >>>> >>
>> >> >>>> >>> for the third time, in context of lsa, faster
and hence perhaps
>> >> better
>> >> >>>> >>> alternative to lanczos is ssvd. Is there any
specific reason
>> you
>> >> want
>> >> >>>> >>> to use lanczos solver in context of LSA?
>> >> >>>> >>>
>> >> >>>> >>> -d
>> >> >>>> >>>
>> >> >>>> >>> On Sun, Feb 26, 2012 at 6:40 AM, Peyman Mohajerian
<
>> >> mohajeri@gmail.com
>> >> >>>> >
>> >> >>>> >>> wrote:
>> >> >>>> >>> > Hi Guys,
>> >> >>>> >>> >
>> >> >>>> >>> > Per you advice I did upgrade to Mahout
.6 and did a bunch of
>> API
>> >> >>>> >>> > changes and in the meantime realized
I had a bug with my
>> input
>> >> >>>> matrix,
>> >> >>>> >>> > zero rows read from Solr b/c multiple
fields in Solr were
>> index
>> >> and
>> >> >>>> >>> > not just the one I was interested in,
that issues is fixed
>> and
>> >> I have
>> >> >>>> >>> > a matrix with these dimensions: (.numCols
mat) 1000 (.numRows
>> >> mat)
>> >> >>>> >>> > 15932 (or the transpose)
>> >> >>>> >>> > Unfortunately I'm getting the below error
now, in the context
>> >> of some
>> >> >>>> >>> > other Mahout algorithm there was a mention
of '/tmp' vs
>> '/_tmp'
>> >> >>>> >>> > causing this issue but in this particular
case the matrix is
>> in
>> >> >>>> >>> > memory!! I'm using this google package:
guava-r09.jar
>> >> >>>> >>> >
>> >> >>>> >>> > SEVERE: java.util.NoSuchElementException
>> >> >>>> >>> >        at
>> >> >>>> >>>
>> >> >>>>
>> >>
>> com.google.common.collect.AbstractIterator.next(AbstractIterator.java:152)
>> >> >>>> >>> >        at
>> >> >>>> >>>
>> >> >>>>
>> >>
>> org.apache.mahout.math.hadoop.TimesSquaredJob.retrieveTimesSquaredOutputVector(TimesSquaredJob.java:190)
>> >> >>>> >>> >        at
>> >> >>>> >>>
>> >> >>>>
>> >>
>> org.apache.mahout.math.hadoop.DistributedRowMatrix.timesSquared(DistributedRowMatrix.java:238)
>> >> >>>> >>> >        at
>> >> >>>> >>>
>> >> >>>>
>> >>
>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver.solve(LanczosSolver.java:104)
>> >> >>>> >>> >        at
>> >> >>>> >>>
>> lsa4solr.mahout_matrix$decompose_svd.invoke(mahout_matrix.clj:165)
>> >> >>>> >>> >
>> >> >>>> >>> >
>> >> >>>> >>> > Any suggestion?
>> >> >>>> >>> > Thanks,
>> >> >>>> >>> > Peyman
>> >> >>>> >>> >
>> >> >>>> >>> >
>> >> >>>> >>> >
>> >> >>>> >>> > On Mon, Feb 20, 2012 at 10:38 AM, Dmitriy
Lyubimov <
>> >> >>>> dlieu.7@gmail.com>
>> >> >>>> >>> wrote:
>> >> >>>> >>> >> Peyman,
>> >> >>>> >>> >>
>> >> >>>> >>> >>
>> >> >>>> >>> >> Yes, what Ted said. Please take 0.6
release. Also try ssvd,
>> it
>> >> may
>> >> >>>> >>> >> benefit you in some regards compared
to Lanczos.
>> >> >>>> >>> >>
>> >> >>>> >>> >> -d
>> >> >>>> >>> >>
>> >> >>>> >>> >> On Sun, Feb 19, 2012 at 10:34 AM,
Peyman Mohajerian <
>> >> >>>> mohajeri@gmail.com>
>> >> >>>> >>> wrote:
>> >> >>>> >>> >>> Hi Dmitriy & Others,
>> >> >>>> >>> >>>
>> >> >>>> >>> >>> Dmitriy thanks for your previous
response.
>> >> >>>> >>> >>> I have a follow up question to
my LSA project. I have
>> managed
>> >> to
>> >> >>>> >>> >>> upload 1,500 documents from two
different news groups (one
>> >> about
>> >> >>>> >>> >>> graphics and one about Atheism
>> >> >>>> >>> >>> http://people.csail.mit.edu/jrennie/20Newsgroups/)
to
>> Solr.
>> >> >>>> However my
>> >> >>>> >>> >>> LanczosSolver in Mahout.4 does
not find any eigenvalues
>> >> (there are
>> >> >>>> >>> >>> eigenvectors as you see in the
follow up logs).
>> >> >>>> >>> >>> The only things I'm doing different
from
>> >> >>>> >>> >>> (https://github.com/algoriffic/lsa4solr)
is that I'm not
>> >> using the
>> >> >>>> >>> >>> 'Summary' field but rather the
actual 'text' field in Solr.
>> >> I'm
>> >> >>>> >>> >>> assuming the issue is that Summary
field already removes
>> the
>> >> noise
>> >> >>>> and
>> >> >>>> >>> >>> make the clustering work and
the raw index data does not do
>> >> that,
>> >> >>>> am I
>> >> >>>> >>> >>> correct or there are other potential
explanations? For the
>> >> desired
>> >> >>>> >>> >>> rank I'm using values between
10-100 and looking for
>> #clusters
>> >> >>>> between
>> >> >>>> >>> >>> 2-10 (different values for different
trials), but always
>> the
>> >> same
>> >> >>>> >>> >>> result comes out, no clusters
found.
>> >> >>>> >>> >>> If my issue is related to not
having summarization done,
>> how
>> >> can
>> >> >>>> that
>> >> >>>> >>> >>> be done in Solr? I wasn't able
to fine a Summary field in
>> >> Solr.
>> >> >>>> >>> >>>
>> >> >>>> >>> >>> Thanks
>> >> >>>> >>> >>> Peyman
>> >> >>>> >>> >>>
>> >> >>>> >>> >>>
>> >> >>>> >>> >>> Feb 19, 2012 3:25:20 AM
>> >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver
>> solve
>> >> >>>> >>> >>> INFO: Lanczos iteration complete
- now to diagonalize the
>> >> >>>> tri-diagonal
>> >> >>>> >>> >>> auxiliary matrix.
>> >> >>>> >>> >>> Feb 19, 2012 3:25:20 AM
>> >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver
>> solve
>> >> >>>> >>> >>> INFO: Eigenvector 0 found with
eigenvalue 0.0
>> >> >>>> >>> >>> Feb 19, 2012 3:25:20 AM
>> >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver
>> solve
>> >> >>>> >>> >>> INFO: Eigenvector 1 found with
eigenvalue 0.0
>> >> >>>> >>> >>> Feb 19, 2012 3:25:20 AM
>> >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver
>> solve
>> >> >>>> >>> >>> INFO: Eigenvector 2 found with
eigenvalue 0.0
>> >> >>>> >>> >>> Feb 19, 2012 3:25:20 AM
>> >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver
>> solve
>> >> >>>> >>> >>> INFO: Eigenvector 3 found with
eigenvalue 0.0
>> >> >>>> >>> >>> Feb 19, 2012 3:25:20 AM
>> >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver
>> solve
>> >> >>>> >>> >>> INFO: Eigenvector 4 found with
eigenvalue 0.0
>> >> >>>> >>> >>> Feb 19, 2012 3:25:20 AM
>> >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver
>> solve
>> >> >>>> >>> >>> INFO: Eigenvector 5 found with
eigenvalue 0.0
>> >> >>>> >>> >>> Feb 19, 2012 3:25:20 AM
>> >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver
>> solve
>> >> >>>> >>> >>> INFO: Eigenvector 6 found with
eigenvalue 0.0
>> >> >>>> >>> >>> Feb 19, 2012 3:25:20 AM
>> >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver
>> solve
>> >> >>>> >>> >>> INFO: Eigenvector 7 found with
eigenvalue 0.0
>> >> >>>> >>> >>> Feb 19, 2012 3:25:20 AM
>> >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver
>> solve
>> >> >>>> >>> >>> INFO: Eigenvector 8 found with
eigenvalue 0.0
>> >> >>>> >>> >>> Feb 19, 2012 3:25:20 AM
>> >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver
>> solve
>> >> >>>> >>> >>> INFO: Eigenvector 9 found with
eigenvalue 0.0
>> >> >>>> >>> >>> Feb 19, 2012 3:25:20 AM
>> >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver
>> solve
>> >> >>>> >>> >>> INFO: Eigenvector 10 found with
eigenvalue 0.0
>> >> >>>> >>> >>> Feb 19, 2012 3:25:20 AM
>> >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver
>> solve
>> >> >>>> >>> >>> INFO: LanczosSolver finished.
>> >> >>>> >>> >>>
>> >> >>>> >>> >>>
>> >> >>>> >>> >>> On Sun, Jan 1, 2012 at 10:06
PM, Dmitriy Lyubimov <
>> >> >>>> dlieu.7@gmail.com>
>> >> >>>> >>> wrote:
>> >> >>>> >>> >>>> In Mahout lsa pipeline is
possible with seqdirectory,
>> >> seq2sparse
>> >> >>>> and
>> >> >>>> >>> ssvd
>> >> >>>> >>> >>>> commands. Nuances are understanding
dictionary format and
>> llr
>> >> >>>> >>> anaylysis of
>> >> >>>> >>> >>>> n-grams and perhaps use a
slightly better lemmatizer than
>> the
>> >> >>>> default
>> >> >>>> >>> one.
>> >> >>>> >>> >>>>
>> >> >>>> >>> >>>> With indexing part you are
on your own at this point.
>> >> >>>> >>> >>>> On Jan 1, 2012 2:28 PM, "Peyman
Mohajerian" <
>> >> mohajeri@gmail.com>
>> >> >>>> >>> wrote:
>> >> >>>> >>> >>>>
>> >> >>>> >>> >>>>> Hi Guys,
>> >> >>>> >>> >>>>>
>> >> >>>> >>> >>>>> I'm interested in this
work:
>> >> >>>> >>> >>>>>
>> >> >>>> >>> >>>>>
>> >> >>>> >>>
>> >> >>>>
>> >>
>> http://www.ccri.com/blog/2010/4/2/latent-semantic-analysis-in-solr-using-clojure.html
>> >> >>>> >>> >>>>>
>> >> >>>> >>> >>>>> I looked at some of the
comments and notices that there
>> was
>> >> >>>> interest
>> >> >>>> >>> >>>>> in incorporating it into
Mahout, back in 2010. I'm also
>> >> having
>> >> >>>> issues
>> >> >>>> >>> >>>>> running this code due
to dependencies on older version of
>> >> Mahout.
>> >> >>>> >>> >>>>>
>> >> >>>> >>> >>>>> I was wondering if LSA
is now directly available in
>> Mahout?
>> >> Also
>> >> >>>> if I
>> >> >>>> >>> >>>>> upgrade to the latest
Mahout would this Clojure code
>> work?
>> >> >>>> >>> >>>>>
>> >> >>>> >>> >>>>> Thanks
>> >> >>>> >>> >>>>> Peyman
>> >> >>>> >>> >>>>>
>> >> >>>> >>>
>> >> >>>>
>> >>
>>

Mime
View raw message