Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 50672 invoked from network); 12 Nov 2008 19:24:12 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 12 Nov 2008 19:24:12 -0000 Received: (qmail 93893 invoked by uid 500); 12 Nov 2008 19:24:12 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 93862 invoked by uid 500); 12 Nov 2008 19:24:12 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 93851 invoked by uid 99); 12 Nov 2008 19:24:12 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Nov 2008 11:24:12 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of sven.carlberg@gmail.com designates 74.125.46.28 as permitted sender) Received: from [74.125.46.28] (HELO yw-out-2324.google.com) (74.125.46.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Nov 2008 19:22:53 +0000 Received: by yw-out-2324.google.com with SMTP id 3so237395ywj.5 for ; Wed, 12 Nov 2008 11:23:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from :user-agent:mime-version:to:subject:content-type :content-transfer-encoding; bh=vwi6TNW81RYfvZdfRnyatnjQ1bCF2p8bdvbvqlrWWiU=; b=MzVMMY1BfUxgxGdHNRR38DnR0PLy9Y6z1KqcKWBCsSsTDsZVE9kc3D+tMFKGCYkWyX 2LUw1k1UKSboxCm0/ZPhMNyLcbUisvjetnU+IPlCx5SKlemCz7Gl0XqOkQpq4hXL2gT2 Y8u/uuEdSN58uepLFUFa3vnegEdipRpaYLjY4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:subject :content-type:content-transfer-encoding; b=flWuPqLMVKk7Mv10fJ7kWWVisPCEsiXQkOm444U1LbtG5CYt35NmURj/er3cV26hMc EftNDHlfZ4zz9Z0lLS9MfB41rXYmVJSZcQtV2Gk7QFJ/EjkrkOblEtnEk2l85BaNO5GM jcQ84h2S2+TR5+euCcfNfcw/05Q+0Xd3YVtA4= Received: by 10.151.112.10 with SMTP id p10mr7398382ybm.17.1226517816776; Wed, 12 Nov 2008 11:23:36 -0800 (PST) Received: from ?192.168.1.210? (pool-96-231-23-56.washdc.fios.verizon.net [96.231.23.56]) by mx.google.com with ESMTPS id m56sm11043999rnd.1.2008.11.12.11.23.34 (version=TLSv1/SSLv3 cipher=RC4-MD5); Wed, 12 Nov 2008 11:23:35 -0800 (PST) Message-ID: <491B2DD2.3030602@gmail.com> Date: Wed, 12 Nov 2008 14:26:10 -0500 From: Sven User-Agent: Thunderbird 1.5.0.13 (X11/20070824) MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: How to get the terms within 5 words of another term? Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Hi everyone, I have a term "foo" and I want to count all the occurrences of all the terms that are within 5 words of "foo" in all the documents which contain "foo". For simplicity sake, this is only for a single field. So if I have 3 documents (each with a single field) that look like this: Once upon a time, foo lived far, far away in a magical kingdom. "The Life and Time of the Hero Called Foo" is, by far, the best novel about spam I have ever read. I theorize that over time, foo will gradually move far away from bar. I would like to generate a list of terms and hits based on their proximity to "foo" in all the documents. So I'll end up with something like: far : 4 time : 3 away : 2 Any help would be greatly appreciated. Thanks much! -Sven --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org