Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 96B7F11315 for ; Tue, 19 Aug 2014 14:05:04 +0000 (UTC) Received: (qmail 89702 invoked by uid 500); 19 Aug 2014 14:04:58 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 89640 invoked by uid 500); 19 Aug 2014 14:04:58 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 89614 invoked by uid 99); 19 Aug 2014 14:04:57 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Aug 2014 14:04:57 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of aivykarter@gmail.com designates 209.85.217.178 as permitted sender) Received: from [209.85.217.178] (HELO mail-lb0-f178.google.com) (209.85.217.178) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Aug 2014 14:04:52 +0000 Received: by mail-lb0-f178.google.com with SMTP id c11so5475154lbj.23 for ; Tue, 19 Aug 2014 07:04:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to:content-type; bh=2TAMvBMLTQZ3/ODEOf67mD+dHJKoySkNkj6T+e+XEoQ=; b=X3jSM/xOC3VZgtV8QwdoFua1fnDEtqqyko2rMj9MTOZO6D8pb7WKMgxARNdVK/WAnW Mh1w6ZNmAr92RgyNwGr+I79cT6Z+7MjpYNNyENhvAXC9UFqhdlnXRY1GEAvERqJuoT8S a2a7u4nrAgk1RVlHUFmG5gCZ7+iVzdAzpDvXzL6Z1gg2eSr5eQKTSwLJ3REUSzmya3G+ h+ltIuDBHbbTIp2/M9+FiYYjBx2rVb82gamDmt4N6lCS+jQrIhUYiIBuOQckBz8f+vvN XjK6GLsPaDC+wMyLLbPsPOvhn0RBmlQg/AhF1qIyYNHRTNUp5piFGzJg2hu/lw1+e9MX qwWA== X-Received: by 10.112.183.162 with SMTP id en2mr34208238lbc.51.1408457071056; Tue, 19 Aug 2014 07:04:31 -0700 (PDT) MIME-Version: 1.0 Received: by 10.152.114.33 with HTTP; Tue, 19 Aug 2014 07:04:11 -0700 (PDT) From: Bianca Pereira Date: Tue, 19 Aug 2014 15:04:11 +0100 Message-ID: Subject: Calculate Term Frequency To: java-user Content-Type: multipart/alternative; boundary=001a11348b3666dced0500fbf713 X-Virus-Checked: Checked by ClamAV on apache.org --001a11348b3666dced0500fbf713 Content-Type: text/plain; charset=UTF-8 Hi everybody, I would like to know your suggestions to calculate Term Frequency in a Lucene document. Currently I am using MultiFields.getTermDocsEnum, iterating through the DocsEnum 'de' returned and getting the frequency with de.freq() for the desired document. My solution gives me the result I want but I am having time issues. For instance, I want to calculate the term frequency for a given term for N documents in a sequence. Then, every time I have a new document I have to retrieve exactly the same DocsEnum again and iterate until find the document I want. Of course I cannot cache DocsEnum (yes, I did this huge mistake) because it is an iterator. Do you have any suggestions on how I can get Term Frequency in a fast way? The unique suggestion I had up to now was "Do it programatically, don't use Lucene". Should be this the solution? Thank you. Regards, Bianca Pereira --001a11348b3666dced0500fbf713--