Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BEF5DE5BE for ; Thu, 31 Jan 2013 14:24:44 +0000 (UTC) Received: (qmail 81347 invoked by uid 500); 31 Jan 2013 14:24:41 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 81292 invoked by uid 500); 31 Jan 2013 14:24:41 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 81264 invoked by uid 99); 31 Jan 2013 14:24:40 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 31 Jan 2013 14:24:40 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of SRS0=bL+YGb=LY=basetechnology.com=jack@yourhostingaccount.com designates 65.254.253.56 as permitted sender) Received: from [65.254.253.56] (HELO mailout07.yourhostingaccount.com) (65.254.253.56) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 31 Jan 2013 14:24:32 +0000 Received: from mailscan15.yourhostingaccount.com ([10.1.15.15] helo=mailscan15.yourhostingaccount.com) by mailout07.yourhostingaccount.com with esmtp (Exim) id 1U0v3j-0007T8-AX for java-user@lucene.apache.org; Thu, 31 Jan 2013 09:24:11 -0500 Received: from impout01.yourhostingaccount.com ([10.1.55.1] helo=impout01.yourhostingaccount.com) by mailscan15.yourhostingaccount.com with esmtp (Exim) id 1U0v3i-0000eN-Cy for java-user@lucene.apache.org; Thu, 31 Jan 2013 09:24:10 -0500 Received: from authsmtp16.yourhostingaccount.com ([10.1.18.16]) by impout01.yourhostingaccount.com with NO UCE id ueQA1k0030LoD9W01eQAUk; Thu, 31 Jan 2013 09:24:10 -0500 X-Authority-Analysis: v=2.0 cv=EJGEIilC c=1 sm=1 a=yH02RjTyxywMAIqhn74x1Q==:17 a=aQzbgH187woA:10 a=_TSjD6-FiyQA:10 a=3jZET7lWBKwA:10 a=jPJDawAOAc8A:10 a=IkcTkHD0fZMA:10 a=jvYhGVW7AAAA:8 a=TKQlbPpERlQA:10 a=HDWgMBI4gAMA:10 a=mV9VRH-2AAAA:8 a=IoX8AVrRe-L50e1iI5EA:9 a=QEXdDO2ut3YA:10 a=ltXMujfcukYoTVgNOHxvog==:117 X-EN-OrigOutIP: 10.1.18.16 X-EN-IMPSID: ueQA1k0030LoD9W01eQAUk Received: from 207-237-113-14.c3-0.nyr-ubr1.nyr.ny.cable.rcn.com ([207.237.113.14] helo=JackKrupansky) by authsmtp16.yourhostingaccount.com with esmtpa (Exim) id 1U0v3i-0007Xb-1Q for java-user@lucene.apache.org; Thu, 31 Jan 2013 09:24:10 -0500 Message-ID: <6ED96CE6BC2241458F82C8B5C80B390E@JackKrupansky> From: "Jack Krupansky" To: References: <1359566857757-4037462.post@n3.nabble.com> <4FF51336E6EE44E1BDEC198EAEAB9BB4@JackKrupansky> <1359594974931-4037583.post@n3.nabble.com> <510A79F2.8070605@andrewgilmartin.com> In-Reply-To: <510A79F2.8070605@andrewgilmartin.com> Subject: =?UTF-8?Q?Re:_How_to_find_related_words_=EF=BC=9F?= Date: Thu, 31 Jan 2013 09:24:07 -0500 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="UTF-8"; reply-type=response Content-Transfer-Encoding: 8bit X-Priority: 3 X-MSMail-Priority: Normal Importance: Normal X-Mailer: Microsoft Windows Live Mail 15.4.3555.308 X-MimeOLE: Produced By Microsoft MimeOLE V15.4.3555.308 X-EN-UserInfo: e0a4b55451ed9f27313ebf02e3d4348d:fc4a93e1349e680c52bdd723c0ab3ef6 X-EN-AuthUser: jack@basetechnology.com Sender: "Jack Krupansky" X-EN-OrigIP: 207.237.113.14 X-EN-OrigHost: 207-237-113-14.c3-0.nyr-ubr1.nyr.ny.cable.rcn.com X-Virus-Checked: Checked by ClamAV on apache.org Oh, so you wanted "similar" words! You should have said so... your inquiry said you were looking for "related" words. So, which is it? More specifically, what exactly are you looking for, in terms of the semantics? In any case, "find similar" (MoreLikeThis) is about the best you can do out of the box. -- Jack Krupansky -----Original Message----- From: Andrew Gilmartin Sent: Thursday, January 31, 2013 9:04 AM To: java-user@lucene.apache.org Subject: Re: How to find related words ? wgggfiy wrote: > en, it seems nice, but I'm puzzled by you and Andrew Gilmartina above, > what's the difference between you guys ? The different is that similar documents do not give you similar terms. Similar documents can show a correlation of terms -- ie, whereever Lucene is mentioned so is Solr and Hadoop -- but in no way does this mean that the terms are similar. Accumulating similar and/or synonymous terms is a manual process. I am sure there are text mining tools/algorithms that make discoveries, but I do not know about these. (I am a journeyman programmer not a researcher.) If anyone does know about them, please share with this list. -- Andrew --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org