[SPARK-57442][SQL] Add opt-in nullability check to logical plan integrity validation by cloud-fan · Pull Request #56502 · apache/spark

cloud-fan · 2026-06-14T17:53:28Z

What changes were proposed in this pull request?

This PR adds an opt-in nullability check to logical plan integrity validation.

A new internal config spark.sql.planChangeValidation.nullability (default false) gates the check. It is a sub-flag of the existing spark.sql.planChangeValidation.
A new LogicalPlanIntegrity.validateNullability walks the plan and flags any operator that references one of its child's output attributes with a narrower nullability than the child produces — i.e. declaring the attribute nullable = false while the child produces it as nullable = true. It is wired into validateOptimizedPlan's existing check chain.
References declared wider (nullable over a non-nullable child) are safe and not flagged; references whose ExprId is not produced by any child (e.g. outer references) are ignored.

Why are the changes needed?

When a rule synthesizes operators, every reference to a child output must carry the nullability the child actually produces. A classic miss is an outer-join rewrite that references the null-widened side with its original non-nullable attributes — for example SPARK-56395 in RewriteNearestByJoin, where the synthesized Aggregate referenced the right side (widened to nullable by the synthetic LEFT OUTER join) with its original non-nullable metadata.

Such a mismatch is a latent plan-integrity defect: the rows are unchanged, so it is invisible to result-equivalence reasoning, to validateSchemaOutput (which compares types asNullable), and to green CI. Yet a later optimization that trusts the non-nullable declaration can drop a null check and produce wrong results. There is currently no integrity check that catches this class of defect; this adds one.

The check is disabled by default — even though spark.sql.planChangeValidation defaults to on in test runs — because not all existing rules maintain precise nullability today, and enabling it globally would surface pre-existing benign mismatches. Suites that have been audited (or rules that have been fixed) can opt in to guard against regressions.

Does this PR introduce any user-facing change?

No. The config is internal and disabled by default; the check only runs under plan-change validation when explicitly enabled.

How was this patch tested?

New unit test in LogicalPlanIntegritySuite that, with the config enabled, asserts:

a reference declared non-nullable over a nullable child is flagged (and the message names the offending attribute);
a faithful reference (same nullability as the child) passes;
a wider reference (nullable over a non-nullable child) passes;

and that the check is a no-op when the config is disabled.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.8)

This pull request and its description were written by Isaac.

…rity validation Co-authored-by: Isaac

MaxGekk

Thanks for adding this — the approach is sound: opt-in via an internal sub-flag, narrowing-only, wider-is-safe, and a strictly local parent-vs-child comparison. The implementation matches the design, and I confirmed outer references are correctly ignored (since OuterReference is a LeafExpression, collect never descends into the wrapped attribute). The .version("4.3.0") is correct per the branch-versioning policy.

A few non-blocking items before merge:

Doc comments are slightly behind. validateOptimizedPlan's "what we check" bullet list doesn't mention this check (or the existing validateAggregateExpressions).
Scope is worth stating explicitly in the scaladoc: only top-level attribute nullability is compared (nested struct/array/map fields are not), subquery plans aren't traversed, and the check is optimizer-only. These are consistent with the existing checks, just undocumented.
Test coverage currently exercises only the simplest shape. I'd add a multi-child join case (the actual SPARK-56395 scenario), a reference nested in a compound expression, an offending node below the root, the ignored-ExprId path with the config ON, and the "any-nullable-wins" duplicate-ExprId merge.

Details inline. None of these block the core change; they mostly tighten the docs and harden the test against regressions.

MaxGekk · 2026-06-15T13:48:54Z

  }

+  /**
+   * This method validates that no operator references one of its child's output attributes with a


Could we make the scope limitations explicit in this scaladoc? The check only compares top-level AttributeReference.nullable — nested nullability (struct fields, array elements, map values) is not validated, so a child producing struct<a: long /* nullable */> referenced as struct<a: long /* non-null */> keeps the same top-level nullable and passes. This mirrors validateSchemaOutput (which compares asNullable), so it's consistent, but since that's exactly the "latent nullability defect" class this check targets, it's worth calling out. Same for subquery plans, which aren't reached by plan.collect (they live inside expressions, not as children).

MaxGekk · 2026-06-15T13:48:54Z

+    }
+    plan.collect {
+      case p if p.childrenResolved =>
+        val childNullable = p.children.flatMap(_.output).groupBy(_.exprId).map {


Minor: worth a one-line note that this is a strictly local parent-vs-immediate-child check over p.expressions. A narrowing introduced at a pure pass-through boundary (an operator that narrows an attribute it only forwards via output, without re-referencing it in expressions) won't be flagged at that node, and the parent then treats the already-narrowed value as the child's truth. The targeted SPARK-56395 shape is caught because the synthesized references appear in expressions, but making the scoping explicit helps future readers.

MaxGekk · 2026-06-15T13:48:54Z

      .orElse(LogicalPlanIntegrity.validateSchemaOutput(previousPlan, currentPlan))
      .orElse(LogicalPlanIntegrity.validateNoDanglingReferences(currentPlan))
      .orElse(LogicalPlanIntegrity.validateAggregateExpressions(currentPlan))
+      .orElse(LogicalPlanIntegrity.validateNullability(currentPlan))


Two things while we're touching this chain:

The validateOptimizedPlan scaladoc bullet list (just above this method) is now out of date — it lists "is still resolved / special expressions / unique IDs / same schema / no dangling references" but omits both validateAggregateExpressions (pre-existing) and the new validateNullability. Could we add a bullet, e.g. - references each child output attribute with consistent (or wider) nullability, and one for the aggregate check too?

Confirming intended scope: this runs only from validateOptimizedPlan, so nullability defects introduced during analysis (not optimization) aren't caught. Assuming that's deliberate — a one-liner in the config doc would make it clear.

MaxGekk · 2026-06-15T13:48:54Z

    assert(checkIfSameExprIdNotReused(t.select(Alias(a + b, "ab")(exprId = b.exprId))).isDefined)
  }
+
+  test("Checks if an attribute is referenced as non-nullable over a nullable child") {


Good start, but this only exercises the simplest shape (one-level Project over a LocalRelation, offending node at the root). Could we extend coverage to the cases that exercise the actual logic and the motivating bug?

Multi-child join shape — an Aggregate/operator over a (synthetic LEFT OUTER) Join referencing the null-widened side as non-nullable. This is the SPARK-56395 scenario and the only one that exercises groupBy(_.exprId) across multiple children.

Offending reference nested in a compound expression (e.g. narrowedRef + 1L, or wrapped in an Alias/function) to confirm the inner _.collect finds it, not just bare top-level references.

Offending node below the root, to confirm the plan.collect traversal rather than root-only inspection.

ExprId not produced by any child, config ON (dangling/outer-reference exprId declared non-nullable) → asserts it's ignored. Currently the "ignored" path is only covered implicitly via the config-disabled assertion.

Duplicate ExprId across children, "any nullable wins" — the behavior the inline comment specifically claims, currently untested.

…overage Address review feedback: spell out the scope limitations of validateNullability (top-level only, strictly local, subqueries not reached), update the validateOptimizedPlan and config docs to cover the aggregate and nullability checks and the optimization-only scope, and add tests for the multi-child outer-join shape, compound expressions, below-root traversal, ignored non-child references, and the duplicate-ExprId "any nullable wins" rule. Co-authored-by: Isaac

Co-authored-by: Isaac

cloud-fan · 2026-06-16T21:08:49Z

the docker test failure is unrelated, thanks, merging to master/4.x!

…rity validation ### What changes were proposed in this pull request? This PR adds an opt-in nullability check to logical plan integrity validation. - A new internal config `spark.sql.planChangeValidation.nullability` (default `false`) gates the check. It is a sub-flag of the existing `spark.sql.planChangeValidation`. - A new `LogicalPlanIntegrity.validateNullability` walks the plan and flags any operator that references one of its child's output attributes with a **narrower** nullability than the child produces — i.e. declaring the attribute `nullable = false` while the child produces it as `nullable = true`. It is wired into `validateOptimizedPlan`'s existing check chain. - References declared **wider** (nullable over a non-nullable child) are safe and not flagged; references whose `ExprId` is not produced by any child (e.g. outer references) are ignored. ### Why are the changes needed? When a rule synthesizes operators, every reference to a child output must carry the nullability the child actually produces. A classic miss is an outer-join rewrite that references the null-widened side with its original non-nullable attributes — for example SPARK-56395 in `RewriteNearestByJoin`, where the synthesized `Aggregate` referenced the right side (widened to nullable by the synthetic `LEFT OUTER` join) with its original non-nullable metadata. Such a mismatch is a latent plan-integrity defect: the rows are unchanged, so it is invisible to result-equivalence reasoning, to `validateSchemaOutput` (which compares types `asNullable`), and to green CI. Yet a later optimization that trusts the non-nullable declaration can drop a null check and produce wrong results. There is currently no integrity check that catches this class of defect; this adds one. The check is disabled by default — even though `spark.sql.planChangeValidation` defaults to on in test runs — because not all existing rules maintain precise nullability today, and enabling it globally would surface pre-existing benign mismatches. Suites that have been audited (or rules that have been fixed) can opt in to guard against regressions. ### Does this PR introduce _any_ user-facing change? No. The config is internal and disabled by default; the check only runs under plan-change validation when explicitly enabled. ### How was this patch tested? New unit test in `LogicalPlanIntegritySuite` that, with the config enabled, asserts: - a reference declared non-nullable over a nullable child is flagged (and the message names the offending attribute); - a faithful reference (same nullability as the child) passes; - a wider reference (nullable over a non-nullable child) passes; and that the check is a no-op when the config is disabled. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Code (Opus 4.8) This pull request and its description were written by Isaac. Closes #56502 from cloud-fan/SPARK-oss-nullability-plan-validation. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 5f1f8da) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

[SPARK-57442][SQL] Add opt-in nullability check to logical plan integ…

f0617a0

…rity validation Co-authored-by: Isaac

cloud-fan force-pushed the SPARK-oss-nullability-plan-validation branch from af32b9c to f0617a0 Compare June 14, 2026 23:10

gengliangwang approved these changes Jun 15, 2026

View reviewed changes

MaxGekk reviewed Jun 15, 2026

View reviewed changes

cloud-fan added 2 commits June 15, 2026 20:08

[SPARK-57442][SQL] Fix inaccurate multi-child nullability test comment

0750c15

Co-authored-by: Isaac

cloud-fan closed this in 5f1f8da Jun 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-57442][SQL] Add opt-in nullability check to logical plan integrity validation#56502

[SPARK-57442][SQL] Add opt-in nullability check to logical plan integrity validation#56502
cloud-fan wants to merge 3 commits into
apache:masterfrom
cloud-fan:SPARK-oss-nullability-plan-validation

cloud-fan commented Jun 14, 2026

Uh oh!

MaxGekk left a comment

Uh oh!

MaxGekk Jun 15, 2026

Uh oh!

MaxGekk Jun 15, 2026

Uh oh!

MaxGekk Jun 15, 2026

Uh oh!

MaxGekk Jun 15, 2026

Uh oh!

cloud-fan commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

cloud-fan commented Jun 14, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

MaxGekk left a comment

Choose a reason for hiding this comment

Uh oh!

MaxGekk Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

MaxGekk Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

MaxGekk Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

MaxGekk Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants