Skip to content

detect uint64 overflow that wraps past min_safe in parse_int_string#393

Closed
sahvx655-wq wants to merge 1 commit into
fastfloat:mainfrom
sahvx655-wq:int-overflow-multiwrap
Closed

detect uint64 overflow that wraps past min_safe in parse_int_string#393
sahvx655-wq wants to merge 1 commit into
fastfloat:mainfrom
sahvx655-wq:int-overflow-multiwrap

Conversation

@sahvx655-wq

Copy link
Copy Markdown
Contributor

parse_int_string only confirms a uint64_t overflow at the max_digits_u64(base) boundary by comparing the wrapped accumulator against min_safe_u64(base). I hit this while differentially fuzzing the integer path against std::from_chars: in the default base 10, an input like 30000000000000000000 comes back as a successful parse holding a truncated 11553255926290448384 rather than result_out_of_range. The min_safe test assumes the value wraps at most once, but a 20-digit base-10 number reaches about 5.4x 2^64, so the accumulator can wrap a full multiple of 2^64 and land back above min_safe and slip through. Every base whose max-length range exceeds 2^64 is affected (most non-power-of-two bases), and only uint64_t, since narrower types are still caught by the trailing range check; the practical risk is a caller seeing ec == success and trusting a silently wrapped value.

At digit_count == max_digits the wrapped accumulator no longer holds enough information to decide, so the boundary is re-run over the parsed digits with a checked multiply-add and rejected as soon as it would pass 2^64. The cheap min_safe comparison stays in front as a fast reject and the exact loop only fires at the digit-count boundary, so the hot path is unchanged. The regression test sits beside the existing out-of-range base cases in tests/fast_int.cpp and covers bases 3 to 36 (2, 4 and 16 already fit in a single 2^64 span); it fails before this change and passes after.

@lemire

lemire commented Jun 14, 2026

Copy link
Copy Markdown
Member

The issue is real, but we can be more efficient, see #394

@sahvx655-wq

Copy link
Copy Markdown
Contributor Author

Agreed, the leading-digit band check in #394 is the better call. It decides in O(1) from the leading digit plus the accumulator that's already been computed, whereas my version re-walked the parsed digits with a checked multiply-add at the boundary, which is redundant work on a path that's already known to be at max_digits length. The correctness hinges on each leading-digit band [d*ms, (d+1)ms) being narrower than 2^64, so within the dmax band the value can only straddle 2^64 once and a single threshold against dmaxms cleanly separates a wrapped accumulator from a genuine in-range value. That's a tidier statement of the same root cause I hit while fuzzing against std::from_chars, where a whole-multiple-of-2^64 wrap landed the accumulator back above min_safe and slipped through as a successful parse.

Your PR keeps the multiwrap regression cases and extends them with the oracle sweep, so the inputs I found are still covered. Happy to close this one in favour of #394 to keep the history tidy, just say the word.

@lemire

lemire commented Jun 14, 2026

Copy link
Copy Markdown
Member

I am closing this PR.

@lemire lemire closed this Jun 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants