Organizational level content analysis

We can use Regex to identify stylistic errors or asses the impact of content changes.

As we develop documentation as code, we can use several lightweight programs to search across documents simultaneously. GitLab and GitHub can do this from within their environments, the below addresses searching locally on our own machines. We can do this as we write and store our content as pure text - not in any proprietary format.

This ability to conduct organization wide document searches means we can save a lot of time and effort when we need to identify how a product or company update might impact existing documentation. For example:

  • We need to change legal names across all our documentation due to a change in company name. We can do this simultaneously with Regex for all documents.
  • We updated some product specifications and need to identify which existing documents mention the affected information.

We can also use Regex within a single document to identify types of language we want to avoid. For example, avoiding using the passive tense too much in instructional documentation.

Example

We produced the below outputs running the following Regex expression on this website:

\b((is|was|are|were|has|have|had) (\w*ed|shown|taken|understood|chosen|come|found|gotten|known|made|thought|seen|been|gone)|will be|got|(had|made) (a|an|the)|should|shall)\b

It shows types of languages we want to avoid and where they occur. We can do this simultaneously for several documents also. We can also change what we search for or even find sentences or words over a certain amount of characters.

We can also use a similar find and replace function, so we can identify text strings and update them. As we use continuous integration pipelines to distribute documentation automatically, we can conduct wide scale updates without having to open, change text, re-generate, and re-distribute documents.

See also

Atom find and replace