Mailing-List: contact mahout-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: mahout-user@lucene.apache.org
Received-SPF: pass (athena.apache.org: domain of shashikant@gmail.com
 designates 209.85.200.168 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:from:date:message-id:subject:to
         :content-type:content-transfer-encoding;
        b=SXaK0QElNQ3ZQcfKl05IwFVeMlAqpLYLIXiLl3P4BwLZ54/RvXWN/gt3rQa2/QJ7aS
         hsjnGoyyMiTILvmynTMn2mhg7HafrPtcYqEKldvkGVCLr2KPQU20/qI7ab4VOAbnVIu+
         Zf5YsW59KcWYABEZkt8C/rRYjNmx5QFwJpW0c=
MIME-Version: 1.0
In-Reply-To: <49F88C6E.50505@windwardsolutions.com>
References: <17469b150904280601i19c734d1icb30862ac5f10c0@mail.gmail.com>
	<B030E4F9-1AF3-49DB-A678-C575A870B1DA@apache.org>
 <17469b150904291008s69e17f7j1c4ae760095c7e35@mail.gmail.com>
	<49F88C6E.50505@windwardsolutions.com>
From: Shashikant Kore <shashikant@gmail.com>
Date: Wed, 29 Apr 2009 22:57:46 +0530
Message-ID: <17469b150904291027p30141aadu43a4b580b7114e42@mail.gmail.com>
Subject: Re: Failure to run Clustering example
To: mahout-user@lucene.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Hi Jeff,

The JDK problem occurs while running the example of Synthetic Control Data =
from
http://cwiki.apache.org/MAHOUT/syntheticcontroldata.html


The other query was related to how to convert convert text files to
Mahout Vector. Let's say, I have text files of wikipedia pages and now
I want to create clusters out of them. How do I get the Mahout vector
from the lucene index? Can you point me to some theory behind it, from
where I can convert it code?

Thanks,

--shashi

On Wed, Apr 29, 2009 at 10:50 PM, Jeff Eastman
<jdog@windwardsolutions.com> wrote:
> Hi Shashi,
>
> That does sound like a JDK version problem. Most jobs require an initial
> step to get the input into the correct vector format to use the clusterin=
g
> code. The
> /Mahout/examples/src/main/java/org/apache/mahout/clustering/syntheticcont=
rol/canopy/Job.java
> calls an InputDriver that does that for the syntheticcontrol examples. Yo=
u
> would need to do something similar to massage your data into Mahout Vecto=
r
> format before you can run the clustering job of your choosing.
>
> Jeff
>
> Shashikant Kore wrote:
>>
>> Thanks for the response, Grant.
>>
>> Upgrading Hadoop didn't really help. Now, I am not able to launch even
>> the Namenode, JobTracker, ... as I am getting same error. I suspect
>> version conflict somewhere as there are two JDK version on the box. I
>> will try it out on another box which has only JDK 6.
>>
>> >From the documentation of clustering, it is not clear how to get the
>> vectors from text (or html) files. I suppose, you can get TF-IDF
>> values by indexing this content with Lucene. How does one proceed from
>> there? Any pointers on that are appreciated.
>>
>> --shashi
>>
>> On Tue, Apr 28, 2009 at 8:40 PM, Grant Ingersoll <gsingers@apache.org>
>> wrote:
>>
>>>
>>> On Apr 28, 2009, at 6:01 AM, Shashikant Kore wrote:
>>>
>>>
>>>>
>>>> Hi,
>>>>
>>>> Initially, I got the version number error at the beginning. I found
>>>> that JDK version was 1.5. It has been upgraded it to 1.6. Now
>>>> JAVA_HOME points to /usr/java/jdk1.6.0_13/ =A0and I am using Hadoop
>>>> 0.18.3.
>>>>
>>>> 1. What could possibly be wrong? I checked the Hadoop script. Value of
>>>> JAVA_HOME is correct (ie 1.6). Is it possible that somehow it is still
>>>> using 1.5?
>>>>
>>>
>>> I'm going to guess the issue is that you need Hadoop 0.19.
>>>
>>>>
>>>> 2. The last step the clustering tutorial says "Get the data out of
>>>> HDFS and have a look." Can you please point me to the documentation of
>>>> Hadoop about how to read this data?
>>>>
>>>
>>> http://hadoop.apache.org/core/docs/current/quickstart.html towards the
>>> bottom. =A0It shows some of the commands you can use w/ HDFS. =A0-get, =
-cat,
>>> etc.
>>>
>>>
>>> -Grant
>>>
>>>
>>
>>
>>
>
>


--=20
Co-founder, Discrete Log Technologies
http://www.bandhan.com/