Readability analysis
Quickly cleaning text and generating a readability score
We can use python to generate a readability score for our documents. We may want to do this to all our documentation from time to time to improve document quality.
The below example gives us a single ARI score for a Sphinx document. We could also recursively apply this for all our documents at once if we want or add it to our DevOps for documentation pipeline.
Example
# prerequisites: Sphinx and pandoc and below listed python libraries
from readability import Readability
from bs4 import BeautifulSoup
import os
os.system("sphinx-build -b singlehtml . _build/singlehtml") # Runs sphinx and builds a single html file of all content.
file = open('_build/singlehtml/index.html',mode='r',encoding='utf8') # Opens the output file with utf8 encoding
soup = BeautifulSoup(file, 'html5lib') # Cleans the markdown html file
for h1 in soup("h1"): # Removes all h1 elements, add other elements as needed
h1.decompose()
r = Readability(soup.text) # Processes readability on cleaned text
f = r.ari() # Runs ARI algorithm on readability
print(f.score) # Prints out the ARI score