Return-Path: Delivered-To: apmail-lucene-mahout-user-archive@minotaur.apache.org Received: (qmail 20873 invoked from network); 2 Jan 2010 14:04:26 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 2 Jan 2010 14:04:26 -0000 Received: (qmail 80383 invoked by uid 500); 2 Jan 2010 14:04:25 -0000 Delivered-To: apmail-lucene-mahout-user-archive@lucene.apache.org Received: (qmail 80327 invoked by uid 500); 2 Jan 2010 14:04:25 -0000 Mailing-List: contact mahout-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mahout-user@lucene.apache.org Delivered-To: mailing list mahout-user@lucene.apache.org Received: (qmail 80317 invoked by uid 99); 2 Jan 2010 14:04:25 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 02 Jan 2010 14:04:25 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of bogdan.vatkov@gmail.com designates 209.85.219.225 as permitted sender) Received: from [209.85.219.225] (HELO mail-ew0-f225.google.com) (209.85.219.225) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 02 Jan 2010 14:04:19 +0000 Received: by ewy25 with SMTP id 25so17188388ewy.5 for ; Sat, 02 Jan 2010 06:03:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=+BDfAu4S0/Fd8wgxaUGnPjJBlZk44TU0WIY+8Obanyg=; b=e+aSqsXYFe2neq1HCNXlEDXL1s4KFSup/d3uCZxZ1rtAeeun1jQGZ0NkA7p4vlDH2e oZsY0nf2kqZkTtJiMmni3DLQ8tcNkSOIx7KxPjUkkfJPUcrZ7ren7cEz687/ZUKyFMoH wDJZ6urbu5DhvnH5Wizb0z1+/hvV2N0TVNvQs= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=K8TVn+0RUve+sds6ULHNae9ZzCIA24PEm+wiub5TDLeY5SA1AtcWNsPLQIDQMj4KKn 7ox2yqGWHqIMKr5v+g0DtA0O7caP7Y+53Ybivo5STY+o5aRM7XBCYRvqku5y8FCPFeKm 1RRax1Naojm5X+0EGAI/7aVU6DJ+gLRsR6Yeo= MIME-Version: 1.0 Received: by 10.213.43.66 with SMTP id v2mr4715022ebe.19.1262441037878; Sat, 02 Jan 2010 06:03:57 -0800 (PST) In-Reply-To: <56747AB3-8E9C-4B77-A610-100CBC8F0737@apache.org> References: <56747AB3-8E9C-4B77-A610-100CBC8F0737@apache.org> Date: Sat, 2 Jan 2010 16:03:57 +0200 Message-ID: Subject: Re: Stopwords work for Solr but not for Mahout From: Bogdan Vatkov To: mahout-user@lucene.apache.org Content-Type: multipart/alternative; boundary=00148530a34d9cd5fb047c2ef5a0 --00148530a34d9cd5fb047c2ef5a0 Content-Type: text/plain; charset=ISO-8859-1 this is my Solr config: and the type text is as configured by default: and I have entered quite some stopwords in the stopwords.txt file my SolrToMahout.sh file: #!/bin/bash set -x cd /store/dev/inst/mahout-0.2 java -classpath /store/dev/inst/mahout-0.2/utils/target/mahout-utils-0.2.jar:$( echo /store/dev/inst/mahout-0.2/utils/target/dependency/*.jar . | sed 's/ /:/g') org.apache.mahout.utils.vectors.lucene.Driver --dir /store/dev/inst/apache-solr-1.4.0/example/solr/data/index \ --output /store/dev/inst/mahout-0.2/clustering-example/solr/output --field msg_body --dictOut /store/dev/inst/mahout-0.2/clustering-example/solr_dict/dict Best regards, Bogdan On Sat, Jan 2, 2010 at 3:49 PM, Grant Ingersoll wrote: > What do the relevant pieces of your Solr setup look like and how are you > invoking the Lucene driver? > > -Grant --00148530a34d9cd5fb047c2ef5a0--