Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 84269E1A0 for ; Thu, 31 Jan 2013 14:05:13 +0000 (UTC) Received: (qmail 99066 invoked by uid 500); 31 Jan 2013 14:05:11 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 98980 invoked by uid 500); 31 Jan 2013 14:05:11 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 98953 invoked by uid 99); 31 Jan 2013 14:05:10 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 31 Jan 2013 14:05:10 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.160.54] (HELO mail-pb0-f54.google.com) (209.85.160.54) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 31 Jan 2013 14:05:02 +0000 Received: by mail-pb0-f54.google.com with SMTP id rr4so1645575pbb.41 for ; Thu, 31 Jan 2013 06:04:41 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:message-id:date:from:user-agent:mime-version:to:subject :references:in-reply-to:content-type:content-transfer-encoding :x-gm-message-state; bh=XJhlypsW9cBeCW1gFg2R7Kfyms1YTWR01PIy+EvuAGU=; b=BQ8IPfuyn/iPVwzpAMEKTKlwZMprWmU7WL0osfa61uqYq1+jJdc9L134rbInKGBprZ DKz9UBwQpLHqWffrX0SOtV0bt+e3aEutQzVOSgISa7+YirLbv2A8AjPIjb8bn9Pk2LBu eNBVLLTe8JAgAFxPKf/jUv/jCXWhmNjcVvYhANT3dxqT8EuZqK/CFLiJy/OygmWDwJxK 5jVvnm81P8UecB/CAdKcj1hZ7pUnReB2USu4fSx3qjOkhqugC5IE8/h1uFXbCxmHLuIN w0XwV0uIoeA9Pak8DEYxmZOZ4mzPb8/JNBkHdZPxamdEQ70buUEO+1ae4tO/+oYz6FZu 17Bg== X-Received: by 10.68.135.67 with SMTP id pq3mr22591514pbb.127.1359641081739; Thu, 31 Jan 2013 06:04:41 -0800 (PST) Received: from jajo.local (ip68-226-93-53.ri.ri.cox.net. [68.226.93.53]) by mx.google.com with ESMTPS id gj1sm5121728pbc.11.2013.01.31.06.04.39 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 31 Jan 2013 06:04:41 -0800 (PST) Message-ID: <510A79F2.8070605@andrewgilmartin.com> Date: Thu, 31 Jan 2013 09:04:34 -0500 From: Andrew Gilmartin User-Agent: Postbox 3.0.7 (Macintosh/20130119) MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Re: How to find related words =?UTF-8?B?77yf?= References: <1359566857757-4037462.post@n3.nabble.com> <4FF51336E6EE44E1BDEC198EAEAB9BB4@JackKrupansky> <1359594974931-4037583.post@n3.nabble.com> In-Reply-To: <1359594974931-4037583.post@n3.nabble.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Gm-Message-State: ALoCoQlspiw3waOFxT4WZh3oEt0bfDXYMlYxxR6/q9iNhCxJyH3D0SyhrazcNhueoklyAnRib8Kk X-Virus-Checked: Checked by ClamAV on apache.org wgggfiy wrote: > en, it seems nice, but I'm puzzled by you and Andrew Gilmartina above, > what's the difference between you guys ? The different is that similar documents do not give you similar terms. Similar documents can show a correlation of terms -- ie, whereever Lucene is mentioned so is Solr and Hadoop -- but in no way does this mean that the terms are similar. Accumulating similar and/or synonymous terms is a manual process. I am sure there are text mining tools/algorithms that make discoveries, but I do not know about these. (I am a journeyman programmer not a researcher.) If anyone does know about them, please share with this list. -- Andrew --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org