lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roland Szűcs <roland.sz...@booknwalk.com>
Subject LIX readability index calculation by solr
Date Wed, 21 Oct 2015 08:52:40 GMT
Hi all,

My use case is that I have to calculate the LIX readability index for my
documents.

*LIX = A/B + (C x 100)/A*, where

*A* = Number of words
*B* = Number of periods (defined by period, colon or capital first letter)
*C* = Number of long words (More than 6 letters)

A can easily be done if the index size does not matter as I define a filed
in the schema without stemming and stop word elimination and use the term
vector compnent. I can calculate all the words, I can calculate easily the
number of long words also.
The only missin component is B.

Does anybody have idea how to get the number of "periods"?

Cheers


-- 
<https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>Roland Szűcs
<https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>Connect with
me on Linkedin <https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>
<https://bookandwalk.hu/>CEOPhone: +36 1 210 81 13Bookandwalk.hu
<https://bokandwalk.hu/>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message