Skip to content

Replace accidental non-ASCII characters with ASCII and add a deterministic CI check plus review rule to prohibit them#7919

Merged
achamayou merged 7 commits into
mainfrom
copilot/replace-non-ascii-symbols
Jun 4, 2026
Merged

Replace accidental non-ASCII characters with ASCII and add a deterministic CI check plus review rule to prohibit them#7919
achamayou merged 7 commits into
mainfrom
copilot/replace-non-ascii-symbols

Conversation

Copilot AI commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

A few non-ASCII symbols (em/en dashes, arrows, an fi ligature) recently slipped into comments and docstrings via copy-paste / AI-generated text. This replaces them with plain ASCII, adds a deterministic CI check to catch future occurrences, and updates the review instructions as an additional soft check.

Replacements (comments/docstrings only, no logic change)

  • / -, ->, fi
  • C++: cose_rs_ffi.h, crypto/test/cbor.cpp, crypto/test/cbor_fuzz.cpp, ds/champ_map.h, host/files_cleanup_timer.h, kv/untyped_map_diff.cpp, kv/untyped_map_handle.cpp, tasks/test/ordered_tasks.cpp
  • Python: tests/e2e_logging.py, tests/e2e_operations.py, tests/recovery.py
  • A stray in scripts/test-buckets-checks.sh status message replaced with ASCII
  • clang-format / black / prettier re-run; all clean

Deterministic CI check

  • New scripts/ascii-checks.sh scans git-tracked source files and fails on any non-ASCII character, reporting the offending file and line numbers.
  • The full list of covered source suffixes lives solely in the script via a single EXTENSIONS list: C/C++ (c/cc/cpp/h/hpp), templated .in, py, js/cjs/mjs/ts, rs, tla and TLA model cfg, cmake/toml/ini, json/yml/yaml/cddl, jinja/hbs, css/html/svg, and sh. Deliberately excluded suffixes are justified in a comment: prose documentation (md/rst/txt) where non-ASCII is legitimate, and binary/generated/vendored data (committed/cose/pem/png/pdf/ico/lock/csv, numbered raft scenario fixtures, everything under 3rdparty/).
  • Wired into scripts/ci-checks.sh as a parallel subjob (Non-ASCII characters), running concurrently alongside the existing formatting/lint checks.
  • Documented in .github/skills/formatting-and-linting.md, which refers to the coverage simply as "source files" rather than enumerating suffixes.

Review instructions

  • New bullet in the Reviews section of .github/copilot-instructions.md instructing the reviewer to flag non-ASCII in source/comments/docstrings/strings and request ASCII equivalents. Retained as an extra soft check for files not covered by the hard CI check.

Deliberately retained

Intentional non-ASCII used for terminal rendering or as symbolic values is left untouched and excluded from the CI check via an explicit allowlist (each annotated with its rationale): box-drawing/block glyphs in tests/infra/basicperf.py, overline in python/src/ccf/ledger_viz.py, /±/ display labels in scripts/compare_bencher_ab.py, the decorative arrow in js/ccf-app/doc/theme/partials/analytics.hbs, the symbolic state glyphs in tla/consensus/MCAliases.tla, and all vendored 3rdparty/ code. The review rule notes this exception.

Copilot AI changed the title Replace accidental non-ASCII characters with ASCII and prohibit non-ASCII in review instructions Replace accidental non-ASCII characters with ASCII and add review rule to prohibit them Jun 4, 2026
Copilot AI requested a review from maxtropets June 4, 2026 11:06
@maxtropets maxtropets marked this pull request as ready for review June 4, 2026 11:07
@maxtropets maxtropets requested a review from a team as a code owner June 4, 2026 11:07
Copilot AI review requested due to automatic review settings June 4, 2026 11:07

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes accidentally introduced non-ASCII characters from comments/docstrings across several C++ and Python files, and updates the repository’s Copilot review guidance to help prevent similar Unicode punctuation/ligatures from being introduced in future changes.

Changes:

  • Replace non-ASCII punctuation/ligatures in comments and docstrings with ASCII equivalents across tests and core sources.
  • Add a Copilot review rule to flag newly introduced non-ASCII characters and request ASCII replacements.

Custom instructions used:

  • .github/copilot-instructions.md
  • .github/instructions/reviewing.instructions.md

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/recovery.py Replaces non-ASCII punctuation in a docstring/comment with ASCII.
tests/e2e_operations.py Replaces non-ASCII punctuation in test comments with ASCII.
tests/e2e_logging.py Replaces non-ASCII punctuation in test comments with ASCII.
src/tasks/test/ordered_tasks.cpp Replaces non-ASCII punctuation in a test comment with ASCII.
src/kv/untyped_map_handle.cpp Replaces non-ASCII punctuation in an explanatory comment with ASCII.
src/kv/untyped_map_diff.cpp Replaces non-ASCII punctuation in an explanatory comment with ASCII.
src/host/files_cleanup_timer.h Replaces non-ASCII punctuation in a comment with ASCII.
src/ds/champ_map.h Replaces an fi ligature in a comment with ASCII fi.
src/crypto/test/cbor.cpp Replaces non-ASCII punctuation in a test comment with ASCII.
src/crypto/test/cbor_fuzz.cpp Replaces non-ASCII punctuation in a comment with ASCII.
src/cose/cose_rs_ffi.h Replaces non-ASCII punctuation in doc comments with ASCII.
.github/copilot-instructions.md Adds a review guideline to flag non-ASCII characters introduced in code/comments/strings.

Comment thread .github/copilot-instructions.md Outdated
@maxtropets

Copy link
Copy Markdown
Collaborator

@copilot in this PR add another parallel ci checks subjob for relevant source files as we do for formatting, which will deterministically check for those as well. Review instructions can remain, it's ok as an extra check in case some symbols slip into the files uncovered by the hard check.

maxtropets and others added 2 commits June 4, 2026 04:19
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot AI commented Jun 4, 2026

Copy link
Copy Markdown
Contributor Author

@copilot in this PR add another parallel ci checks subjob for relevant source files as we do for formatting, which will deterministically check for those as well. Review instructions c...

Added scripts/non-ascii-checks.sh, a deterministic check that fails on non-ASCII characters in C/C++ and Python sources (.h/.hpp/.c/.cpp/.py), excluding 3rdparty/ and an allowlist of intentionally non-ASCII files (ledger_viz.py, compare_bencher_ab.py, basicperf.py). It runs as a parallel subjob in scripts/ci-checks.sh alongside the other format checks, and is documented in .github/skills/formatting-and-linting.md. The Copilot review rule is kept as an extra soft check. (ebe6f67)

Copilot AI changed the title Replace accidental non-ASCII characters with ASCII and add review rule to prohibit them Replace accidental non-ASCII characters with ASCII and add a deterministic CI check plus review rule to prohibit them Jun 4, 2026
Copilot AI requested a review from maxtropets June 4, 2026 11:24
Comment thread scripts/ascii-checks.sh Outdated
Comment thread .github/skills/formatting-and-linting.md Outdated
Comment thread .github/skills/formatting-and-linting.md Outdated
@achamayou

Copy link
Copy Markdown
Member

@copilot can you rename the script from non-ascii-checks.sh to ascii-checks.sh please?

Copilot AI commented Jun 4, 2026

Copy link
Copy Markdown
Contributor Author

@copilot can you rename the script from non-ascii-checks.sh to ascii-checks.sh please?

Renamed scripts/non-ascii-checks.sh to scripts/ascii-checks.sh and updated all references in scripts/ci-checks.sh and .github/skills/formatting-and-linting.md.

Copilot AI requested a review from achamayou June 4, 2026 12:34
@achamayou

Copy link
Copy Markdown
Member

@copilot there are other types of source files in the repo for which we want this check too, for example .js, .ts, .yaml, .json, .jinja, .tla etc. Make a full list of them, cover them all in the script correctly, and don't mention all of them in the documentation, just "source files". Only the script itself should capture the full list of file suffixes.

If there are suffixes you choose to rule out, justify why in comment!

Copilot AI commented Jun 4, 2026

Copy link
Copy Markdown
Contributor Author

@copilot there are other types of source files in the repo for which we want this check too, for example .js, .ts, .yaml, .json, .jinja, .tla etc. Make a full list of them, cover them all in the scrip...

Expanded scripts/ascii-checks.sh to cover all source file types via a single EXTENSIONS list (C/C++ c/cc/cpp/h/hpp, templated .in, py, js/cjs/mjs/ts, rs, tla + TLA model cfg, cmake/toml/ini, json/yml/yaml/cddl, jinja/hbs, css/html/svg, sh). Deliberately excluded suffixes are justified in a comment: prose docs (md/rst/txt) where non-ASCII is legitimate, and binary/generated/vendored data (committed/cose/pem/png/pdf/ico/lock/csv, numbered raft scenario fixtures, 3rdparty/). The documentation now just says "source files". Files with intentional non-ASCII (analytics.hbs decorative arrow, MCAliases.tla symbolic state glyphs) are allowlisted, and a stray in test-buckets-checks.sh was replaced with ASCII. (commit pushed)

@achamayou achamayou merged commit e871ae0 into main Jun 4, 2026
19 checks passed
@achamayou achamayou deleted the copilot/replace-non-ascii-symbols branch June 4, 2026 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants