Readability analysis

Quickly cleaning text and generating a readability score

Thu 03 December 2020

We can use python to generate a readability score for our documents. We may want to do this to all our documentation from time to time to improve document quality.

The below example gives us a single ARI score for a Sphinx document. We could also recursively apply this for all our documents at once if we want or add it to our DevOps for documentation pipeline.

Example

  # prerequisites: Sphinx and pandoc and below listed python libraries
  from readability import Readability
  from bs4 import BeautifulSoup
  import os

  os.system("sphinx-build -b singlehtml . _build/singlehtml") # Runs sphinx and builds a single html file of all content.
  file = open('_build/singlehtml/index.html',mode='r',encoding='utf8') # Opens the output file with utf8 encoding
  soup = BeautifulSoup(file, 'html5lib') # Cleans the markdown html file

  for h1 in soup("h1"): # Removes all h1 elements, add other elements as needed
      h1.decompose()

  r = Readability(soup.text) # Processes readability on cleaned text
  f = r.ari() # Runs ARI algorithm on readability
  print(f.score) # Prints out the ARI score