Enforcing terminology
Companies often need to enforce specific language due to regulatory needs or industry standards.
Using Vale.sh during our document release pipeline helps us enforce any type of language standard on our content. We define any number of rulesets as single YAML files that then run regex tests over content. We can use this to develop our own company's style guideline and ensure our language and terminology is correct.
Example one
For example, USB trademarks are often confused with actual adjectives. The below rules enforced during our release pipeline alert us to possible errors. The swap
key looks for instances using regex. In this case, the USB specification is "High Speed", which allows for high-speed data transfer..
---
# Terms.yml
# Error: semiconductor.Terms
#
# Terms that contributors must use correctly.
#
extends: substitution
message: "Use '%s' instead of '%s'."
ignorecase: false
level: error
swap:
# bad: good
'super speed|(?<![-|\/])super-speed|Super speed|Super-Speed|Super-speed': SuperSpeed(+)|SS(+)
'high speed|High speed|highspeed|Highspeed': High Speed (Referring to USB spec.)| high-speed/HS (adjective)
'(?<![at]) low speed|Low speed|lowspeed|Lowspeed': Low Speed (Referring to USB spec.)| low-speed/LS (adjective)
'(?<![\.|@|\/|\-|_|\[|:|: |`|"|<])(?:usb 4|USB 4)|usb-4|USB-4': USB4 (® in first instance)
Example two
We can define exactly how to refer to bits and bytes to avoid ambiguity. The tokens
key runs regex searches over all text for matches. For example, we want to write 8-bit and not 8b or 8 b or 8B etc etc.
---
# BitsBytes.yml
# Error: semiconductor.BitsBytes
#
extends: existence
message: "Use correct terminology for bits and bytes in ('%s')."
description: |
Units:
bit = bit
byte = B
Metric prefixes:
Kilo = k
Mega = M
Giga = G
Tera = T
Examples:
8-bit, 8 bits, 8 Mbit, 8 Gbit/s
8 B, 8 kB, 8 MB/s, 9 TB
ignorecase: false
level: error
tokens:
- (?<![Version 1.|Microstrip |Stripline])(?<!`)(?<!\/)\d+\s*(?:Bits?|b|bit)(?:\[0-9]+b|)(?!\]_)(?!`)(?!\/)
- ([0-9]+\-[bB]its)
- \b[0-9]+ (([kK]|[mM]|[gG])b)\b
- ([0-9]+ [bB]yte?s)
exceptions:
- '(150|160|010|0|00|001)b'