Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C127C1752C for ; Wed, 21 Oct 2015 08:53:37 +0000 (UTC) Received: (qmail 92587 invoked by uid 500); 21 Oct 2015 08:52:58 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 92515 invoked by uid 500); 21 Oct 2015 08:52:58 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 92503 invoked by uid 99); 21 Oct 2015 08:52:58 -0000 Received: from Unknown (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Oct 2015 08:52:58 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id D5AE21809DB for ; Wed, 21 Oct 2015 08:52:57 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.011 X-Spam-Level: *** X-Spam-Status: No, score=3.011 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=3, T_REMOTE_IMAGE=0.01, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id Kd1RdgkvNl0Q for ; Wed, 21 Oct 2015 08:52:42 +0000 (UTC) Received: from mail-ob0-f177.google.com (mail-ob0-f177.google.com [209.85.214.177]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id 8FA5A24B1B for ; Wed, 21 Oct 2015 08:52:41 +0000 (UTC) Received: by obcqt19 with SMTP id qt19so35386638obc.3 for ; Wed, 21 Oct 2015 01:52:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:date:message-id:subject:from:to :content-type; bh=wEFXQI7svAzMWH9d3ExVSN9SEbczlqbS05czPXEsjTw=; b=XC3V2S4i3+9jNVzVtwgQ/MJiqt00ex/LcijCv0cvKjTAgsvk1/wXSSFfTq5Pshaic6 iXIPEYUYiiPbFJzeqY/B+XoxTcZBrFE25ZRb0QOwBUww6VwC0oAisjP52QWHsAuUdalT ZUI5BeUSISVpru60XsUXIFni1ThX1/KC2sGa6wPXufYZOBOJvL51McfoWC3e3aeBvqfk zSu5Dy8BFjPCkj/WNT5TIGDh55Q6iExD08uGNUggG1DkA4lma9HOTmKUY7Dz853Yj3KW HfxTRtdm93RqeAoCh97UcgsCCJfAkMzoK/Uw9VqkJaaMjlxMaIfj6/R9Pry/TDAhSB4A Yo3Q== X-Gm-Message-State: ALoCoQluOohV3wdK9/Ir/x9SKehmLXaP6v3c7vCSKfbbQNhWA/VwpRumZ0Eu8vEsb+zJaj41tGZ3 MIME-Version: 1.0 X-Received: by 10.60.82.36 with SMTP id f4mr5023450oey.55.1445417560369; Wed, 21 Oct 2015 01:52:40 -0700 (PDT) Received: by 10.60.12.70 with HTTP; Wed, 21 Oct 2015 01:52:40 -0700 (PDT) X-Originating-IP: [89.133.21.219] Date: Wed, 21 Oct 2015 10:52:40 +0200 Message-ID: Subject: LIX readability index calculation by solr From: =?UTF-8?Q?Roland_Sz=C5=B1cs?= To: solr-user@lucene.apache.org Content-Type: multipart/alternative; boundary=047d7b676b0a3d0b1505229980d9 --047d7b676b0a3d0b1505229980d9 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi all, My use case is that I have to calculate the LIX readability index for my documents. *LIX =3D A/B + (C x 100)/A*, where *A* =3D Number of words *B* =3D Number of periods (defined by period, colon or capital first letter= ) *C* =3D Number of long words (More than 6 letters) A can easily be done if the index size does not matter as I define a filed in the schema without stemming and stop word elimination and use the term vector compnent. I can calculate all the words, I can calculate easily the number of long words also. The only missin component is B. Does anybody have idea how to get the number of "periods"? Cheers --=20 Roland Sz=C5= =B1cs Connect with me on Linkedin CEOPhone: +36 1 210 81 13Bookandwalk.hu --047d7b676b0a3d0b1505229980d9--