Replace accidental non-ASCII characters with ASCII and add a deterministic CI check plus review rule to prohibit them#7919
Conversation
There was a problem hiding this comment.
Pull request overview
This PR removes accidentally introduced non-ASCII characters from comments/docstrings across several C++ and Python files, and updates the repository’s Copilot review guidance to help prevent similar Unicode punctuation/ligatures from being introduced in future changes.
Changes:
- Replace non-ASCII punctuation/ligatures in comments and docstrings with ASCII equivalents across tests and core sources.
- Add a Copilot review rule to flag newly introduced non-ASCII characters and request ASCII replacements.
Custom instructions used:
.github/copilot-instructions.md.github/instructions/reviewing.instructions.md
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
tests/recovery.py |
Replaces non-ASCII punctuation in a docstring/comment with ASCII. |
tests/e2e_operations.py |
Replaces non-ASCII punctuation in test comments with ASCII. |
tests/e2e_logging.py |
Replaces non-ASCII punctuation in test comments with ASCII. |
src/tasks/test/ordered_tasks.cpp |
Replaces non-ASCII punctuation in a test comment with ASCII. |
src/kv/untyped_map_handle.cpp |
Replaces non-ASCII punctuation in an explanatory comment with ASCII. |
src/kv/untyped_map_diff.cpp |
Replaces non-ASCII punctuation in an explanatory comment with ASCII. |
src/host/files_cleanup_timer.h |
Replaces non-ASCII punctuation in a comment with ASCII. |
src/ds/champ_map.h |
Replaces an fi ligature in a comment with ASCII fi. |
src/crypto/test/cbor.cpp |
Replaces non-ASCII punctuation in a test comment with ASCII. |
src/crypto/test/cbor_fuzz.cpp |
Replaces non-ASCII punctuation in a comment with ASCII. |
src/cose/cose_rs_ffi.h |
Replaces non-ASCII punctuation in doc comments with ASCII. |
.github/copilot-instructions.md |
Adds a review guideline to flag non-ASCII characters introduced in code/comments/strings. |
|
@copilot in this PR add another parallel ci checks subjob for relevant source files as we do for formatting, which will deterministically check for those as well. Review instructions can remain, it's ok as an extra check in case some symbols slip into the files uncovered by the hard check. |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Added |
|
@copilot can you rename the script from non-ascii-checks.sh to ascii-checks.sh please? |
Renamed |
|
@copilot there are other types of source files in the repo for which we want this check too, for example .js, .ts, .yaml, .json, .jinja, .tla etc. Make a full list of them, cover them all in the script correctly, and don't mention all of them in the documentation, just "source files". Only the script itself should capture the full list of file suffixes. If there are suffixes you choose to rule out, justify why in comment! |
Expanded |
A few non-ASCII symbols (em/en dashes, arrows, an
filigature) recently slipped into comments and docstrings via copy-paste / AI-generated text. This replaces them with plain ASCII, adds a deterministic CI check to catch future occurrences, and updates the review instructions as an additional soft check.Replacements (comments/docstrings only, no logic change)
—/–→-,→→->,fi→ficose_rs_ffi.h,crypto/test/cbor.cpp,crypto/test/cbor_fuzz.cpp,ds/champ_map.h,host/files_cleanup_timer.h,kv/untyped_map_diff.cpp,kv/untyped_map_handle.cpp,tasks/test/ordered_tasks.cpptests/e2e_logging.py,tests/e2e_operations.py,tests/recovery.py✓inscripts/test-buckets-checks.shstatus message replaced with ASCIIDeterministic CI check
scripts/ascii-checks.shscans git-tracked source files and fails on any non-ASCII character, reporting the offending file and line numbers.EXTENSIONSlist: C/C++ (c/cc/cpp/h/hpp), templated.in,py,js/cjs/mjs/ts,rs,tlaand TLA modelcfg,cmake/toml/ini,json/yml/yaml/cddl,jinja/hbs,css/html/svg, andsh. Deliberately excluded suffixes are justified in a comment: prose documentation (md/rst/txt) where non-ASCII is legitimate, and binary/generated/vendored data (committed/cose/pem/png/pdf/ico/lock/csv, numbered raft scenario fixtures, everything under3rdparty/).scripts/ci-checks.shas a parallel subjob (Non-ASCII characters), running concurrently alongside the existing formatting/lint checks..github/skills/formatting-and-linting.md, which refers to the coverage simply as "source files" rather than enumerating suffixes.Review instructions
.github/copilot-instructions.mdinstructing the reviewer to flag non-ASCII in source/comments/docstrings/strings and request ASCII equivalents. Retained as an extra soft check for files not covered by the hard CI check.Deliberately retained
Intentional non-ASCII used for terminal rendering or as symbolic values is left untouched and excluded from the CI check via an explicit allowlist (each annotated with its rationale): box-drawing/block glyphs in
tests/infra/basicperf.py, overline inpython/src/ccf/ledger_viz.py,∞/±/→display labels inscripts/compare_bencher_ab.py, the decorative arrow injs/ccf-app/doc/theme/partials/analytics.hbs, the symbolic state glyphs intla/consensus/MCAliases.tla, and all vendored3rdparty/code. The review rule notes this exception.