Source data to customer content flow
Using python to extract data from engineering systems, clean it, then parse it for user documentation.
I just finished a recent project where we had CSV tables consisting of 30,000+ lines that we needed to put into some sort of searchable and readable format for customers. After some trials, I found that parsing the information into a non-indexable HTML format that relied on the web browser text search function addressed this issue. Generating 30,000+ indexable locations in each file burdens both the document generation and the actual access of files.
We used the Python pandas and texttable functions to generate the 30,000+ readable text tables from the content. We also created a single basic table that included the key searchable content and a link to the relevant individual table.
After importing the source CSV files from engineering, at a glance, we:
- Applied regex cleaning of the data fields.
- Created single text files for each line that included a description and a sub-table of key content, as shown below.
Our pipeline built the generated text files on-the-fly. This allowed us to import and parse source data as often as we needed to, without having to add a huge amount to data to the actual document repository.
Example
# generateMemoryMapRegisters.py snippet
# table generation from https://pypi.org/project/texttable/
tableObj = texttable.Texttable(max_width=118)# Set columns
tableObj.header(["Range","Name", "Type", "Reset", "Description"])
for i, row in df.loc[[index]].iterrows(): # Iterate register individual fields as a table
description = str(row['help'])+"\r\n\r\n"+str(row['map'])
tableObj.add_row(
[str(row['range']),str(row['name']),str(row['type']),str(row['reset']),description]
)
# Display table
print(tableObj.draw())
# ./gitlab-ci.yml pipeline example
fields:
stage: build
script:
- python generateMemoryMapOverview.py
- python generateMemoryMapRegisters.py
artifacts:
paths:
- _static/fields_*
- docs/tables/*.CSV_overview