Source data to customer content flow

Using python to extract data from engineering systems, clean it, then parse it for user documentation.

I just finished a recent project where we had CSV tables consisting of 30,000+ lines that we needed to put into some sort of searchable and readable format for customers. After some trials, I found that parsing the information into a non-indexable HTML format that relied on the web browser text search function addressed this issue. Generating 30,000+ indexable locations in each file burdens both the document generation and the actual access of files.

We used the Python pandas and texttable functions to generate the 30,000+ readable text tables from the content. We also created a single basic table that included the key searchable content and a link to the relevant individual table.

After importing the source CSV files from engineering, at a glance, we:

  1. Applied regex cleaning of the data fields.
  2. Created single text files for each line that included a description and a sub-table of key content, as shown below.

Our pipeline built the generated text files on-the-fly. This allowed us to import and parse source data as often as we needed to, without having to add a huge amount to data to the actual document repository.

Example

# generateMemoryMapRegisters.py snippet

# table generation from https://pypi.org/project/texttable/
tableObj = texttable.Texttable(max_width=118)# Set columns
tableObj.header(["Range","Name", "Type", "Reset", "Description"])
for i, row in df.loc[[index]].iterrows(): # Iterate register individual fields as a table
    description = str(row['help'])+"\r\n\r\n"+str(row['map'])
    tableObj.add_row(
        [str(row['range']),str(row['name']),str(row['type']),str(row['reset']),description]
        )
# Display table
print(tableObj.draw())
# ./gitlab-ci.yml pipeline example

fields:
  stage: build
  script:
    - python generateMemoryMapOverview.py
    - python generateMemoryMapRegisters.py
  artifacts:
    paths:
      - _static/fields_*
      - docs/tables/*.CSV_overview

See also