Skip to content

General boundary_flux primitive: Consistent Boundary Flux for any solver (heat flux / Nusselt, traction)#294

Merged
lmoresi merged 3 commits into
developmentfrom
feature/boundary-flux
Jul 2, 2026
Merged

General boundary_flux primitive: Consistent Boundary Flux for any solver (heat flux / Nusselt, traction)#294
lmoresi merged 3 commits into
developmentfrom
feature/boundary-flux

Conversation

@lmoresi

@lmoresi lmoresi commented Jul 1, 2026

Copy link
Copy Markdown
Member

What

This set of functionality goes hand in hand with the rotated bc toolkit. Once we have exact boundary conditions, we can use the consistent boundary flux methods to obtain (wait for it) consistent fluxes at the boundaries. This is the most accurate was to determine topography, heat flux, etc. It is not compatible with the penalty approaches (very low accuracy).

Adds solver.boundary_flux(boundary, mass="lumped", remove_mean=False, normal=None) and
solver.boundary_flux_field(boundary, field, scale=1.0, ...) to SolverBaseClass, so the
Consistent Boundary Flux (CBF, Gresho et al.) recovery is a general capability of any
solver
, not a Stokes-only / topography-specific one.

The essential-BC reaction — the assembled interior FEM residual at a Dirichlet node — is the
consistent nodal flux; de-smearing it with the boundary mass gives a pointwise surface flux:

  • scalar diffusion / advection-diffusion → surface heat flux -k ∂T/∂n (boundary mean = Nusselt number)
  • Stokes → boundary traction σ·n̂ (σ_nn with normal)

Design

  • SolverBaseClass._assemble_volume_reaction(): field-structure-agnostic (Stokes self.fields,
    else the single Unknowns.u) volume FEM residual (PETSc's rock-solid volume integration).
  • utilities/boundary_flux.py: the equation-independent parts — boundary-node gathering, the
    lumped/consistent boundary-mass de-smear (remove_mean flag: keep the mean for a physical flux
    = Nusselt, remove it for a gauge-free field = dynamic topography), and the field write.

boundary_flux is the primitive; boundary_normal_traction / dynamic_topography (on the
rotated-free-slip branch #293) become thin readings of it. remove_mean defaults False — the
mean flux is physical.

Parallel correctness (the hard part)

A partition that CUTS the flux boundary needs the nodal reaction assembled across ranks. Rather
than hand-roll localToGlobal(ADD) (which was wrong at cut nodes), the DM has overlap=0 so PETSc's
volume residual gives each rank its OWNED-cell partial; the complete reaction is assembled by
summing the partials across ranks by coordinate — the same rock-solid gather used for the
boundary mass. Verified against the rock-solid volume-integral identity
total_flux = ∫_Ω ∇T·∇w, which is byte-identical at np=1/2/4.

Validation

Harmonic Poisson with a known analytic flux: serial corr 1.0000, relL2 2e-4 (converging
8e-4→2e-4→4e-4 at res 24/48/96), mean flux = Nusselt 2/sinh π; BdIntegral of the flux field
byte-identical np=1/2/4.

  • tests/test_1019_boundary_flux.py (serial)
  • tests/parallel/test_1065_boundary_flux_parallel.py — passes at -n 2 and -n 4 (boundary cut)

Follow-up

Once this lands, #293 rebases boundary_normal_traction / dynamic_topography onto boundary_flux.

Underworld development team with AI support from Claude Code

lmoresi added 2 commits July 1, 2026 22:01
…lux)

Add solver.boundary_flux(boundary, mass="lumped", remove_mean=False, normal=None) and
solver.boundary_flux_field(...) to SolverBaseClass, so the Consistent Boundary Flux
recovery is a general capability of any solver, not a Stokes-only / topography-specific
one. The essential-BC reaction (the assembled interior FEM residual at a Dirichlet node)
is the consistent nodal flux; de-smearing it with the boundary mass gives a pointwise
surface flux:
  - scalar diffusion / advection-diffusion -> surface heat flux -k dT/dn (mean = Nusselt)
  - Stokes                                  -> boundary traction sigma.n (sigma_nn)

New pieces:
- SolverBaseClass._assemble_volume_reaction(): field-structure-agnostic (self.fields for
  Stokes, else the single Unknowns.u), globally assembled volume FEM residual.
- utilities/boundary_flux.py: equation-independent boundary-node gathering, the lumped/
  consistent boundary-mass de-smear (remove_mean flag: keep the mean for a physical flux
  = Nusselt, remove it for a gauge-free field = dynamic topography), and the field write.

Naming: boundary_flux is the primitive; boundary_normal_traction / dynamic_topography
(on the rotated-free-slip branch) become thin readings of it (traction; mean-removed
topography). remove_mean defaults False here — the mean flux is physical.

Validated on a harmonic Poisson (analytic flux): serial corr 1.0000, relL2 2e-4,
converging 0.0008->0.0002->0.0004 at res 24/48/96; mean flux == Nusselt (2/sinh pi);
np=2 machine-identical. test_1019 (serial).

KNOWN LIMITATION (inline TODO): a partition that CUTS a flux boundary (np=4 on a box)
has a ~few-% localized error at the cut node — the parallel assembly of the volume FEM
residual there is not yet exact. The solve is partition-independent (verified to 1e-12);
serial + boundary-uncut partitions are machine-exact. Parallel test gated to np=2 until
the cut-node assembly is fixed. Not yet PR'd for that reason.

Underworld development team with AI support from Claude Code
…ut boundary nodes

The boundary flux was wrong (~7% localized) when a partition CUT the flux boundary (np=4
on a box). Root cause: _assemble_volume_reaction hand-rolled the assembly with
localToGlobal(ADD) on the DMPlexSNESComputeResidualFEM output, which does not reproduce
the true reaction at a cut node. Confirmed against the rock-solid VOLUME integral
identity total_flux = integral(grad T . grad w): that is byte-identical at np=1/2/4,
while the ADD-assembled reaction diverged at np=4.

Fix (no hand-rolled global assembly): the DM has overlap=0, so PETSc's volume residual
gives each rank only its OWNED cells' partial contribution at a shared boundary node.
Return the raw local residual and assemble the complete reaction by SUMMING each rank's
partial across ranks by coordinate — the same rock-solid coordinate gather already used
for the boundary mass. This reproduces the volume integral exactly at cut nodes.

Validated (harmonic Poisson, analytic flux): surface heat flux corr 1.0000, relL2 2e-4,
and BdIntegral of the flux field byte-identical at np=1/2/4. New parallel test
test_1065 passes at -n 2 AND -n 4 (boundary cut); serial test_1019 still passes.

Underworld development team with AI support from Claude Code
Copilot AI review requested due to automatic review settings July 1, 2026 21:49

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a solver-agnostic Consistent Boundary Flux (CBF) primitive to Underworld3 so any solver can recover boundary flux/traction from essential-BC reactions, with serial + MPI regression tests to validate accuracy and partition-independence.

Changes:

  • Introduces SolverBaseClass.boundary_flux() and SolverBaseClass.boundary_flux_field() backed by a new _assemble_volume_reaction() helper.
  • Adds equation-independent boundary-node gathering + mass de-smearing + field hand-off in utilities/boundary_flux.py.
  • Adds serial and parallel tests validating heat-flux recovery accuracy and MPI partition-independence.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 9 comments.

File Description
tests/test_1019_boundary_flux.py Serial accuracy test for scalar diffusion heat flux + mean flux preservation (Nusselt).
tests/parallel/test_1065_boundary_flux_parallel.py MPI regression to ensure partition-independent flux recovery when the boundary is cut by the partition.
src/underworld3/utilities/boundary_flux.py New utility implementing boundary node extraction, (lumped/consistent) boundary-mass de-smear, and field write-back.
src/underworld3/cython/petsc_generic_snes_solvers.pyx Adds _assemble_volume_reaction() plus new public solver methods boundary_flux / boundary_flux_field.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +208 to +210
dim = solver.mesh.dim
xs, flux = boundary_flux(solver, boundary, mass=mass, remove_mean=remove_mean, normal=normal)
fmap = {_key(x, dim): scale * float(f) for x, f in zip(np.asarray(xs), np.asarray(flux).ravel())}
Comment on lines +10 to +14
This module holds the parts that do not depend on the equation: the boundary-node
gathering, the boundary-mass de-smear (lumped / consistent), and the field hand-off.
The equation-specific bit — extracting the nodal reaction — is the solver method
``_assemble_volume_reaction`` (globally assembled, so shared boundary nodes are complete).

Comment on lines +160 to +168
else:
M = np.zeros((n, n))
Me = np.array([[4., 2, -1], [2, 16, 2], [-1, 2, 4]])
for (ka, km, kb), h in uniq.items():
tri = [gi[ka], gi[km], gi[kb]]; Mh = (h / 30.0) * Me
for ii in range(3):
for jj in range(3):
M[tri[ii], tri[jj]] += Mh[ii, jj]
sig = np.linalg.solve(M, Rg)
Comment on lines +1829 to +1833
~0. General across scalar / vector / Stokes solvers: the current solution is
gathered from ``self.fields`` when present (Stokes) else the single
``Unknowns.u`` field, and the per-rank cell residual is summed across ranks
(``localToGlobal`` ADD) so boundary nodes shared across a partition cut carry the
complete flux.
Comment on lines +39 to +42
"""DMPlex points carrying `field_id` DOFs on `boundary`, with their coordinates.
Parallel-safe: a rank owning no part of the boundary gets a NULL stratum IS
(guarded); ghost nodes are included (their reaction is completed by the global
assembly in ``_assemble_volume_reaction``)."""
Comment on lines +49 to +51
fS, fE = dm.getHeightStratum(1)
bval = [b.value for b in solver.mesh.boundaries if b.name == boundary][0]
sis = dm.getStratumIS(boundary, bval)
Comment on lines +86 to +88
# accumulate area-weighted facet normals to the closure nodes
bval = [b.value for b in solver.mesh.boundaries if b.name == boundary][0]
sis = dm.getStratumIS(boundary, bval)
Comment on lines +129 to +131
nodeR = {_key(x, dim): float(r) for x, r in zip(xs, R)}
bval = [b.value for b in solver.mesh.boundaries if b.name == boundary][0]
sis = dm.getStratumIS(boundary, bval)
Comment on lines +1823 to +1824
"""Globally-assembled FEM VOLUME residual (no boundary terms) in the DM-local
layout, as a numpy array.
…ail-fast, docs

- Use the consolidated "UW_Boundaries" DM label (distinguished by value) for boundary
  strata instead of a per-boundary label named like the boundary. The named per-boundary
  labels do not survive mesh adaptation (only UW_Boundaries is rebuilt), so the old
  lookup could return an empty stratum and silently give empty flux on adapted meshes.
  New _boundary_stratum_is() helper; raises a clear error for an unknown boundary name
  (was an unhelpful IndexError). Fixes all three lookup sites.
- boundary_flux_field: fail fast when the target is a scalar MeshVariable but the flux is
  a vector (a vector solver with normal=None), instead of silently pairing a flattened
  vector with the nodes.
- Correct docstrings (module, _boundary_field_nodes, _assemble_volume_reaction) to
  describe the actual parallel path: the reaction is the RAW per-rank volume residual,
  assembled at cut nodes by coordinate-summing in _desmear (not a global localToGlobal).
- Note the consistent-mass de-smear is a dense O(n^3) solve; lumped (default) is O(n).

Serial test_1019 and parallel test_1065 (-n 4) still pass.

Underworld development team with AI support from Claude Code
@lmoresi

lmoresi commented Jul 2, 2026

Copy link
Copy Markdown
Member Author

Updated after copilot reviews

@lmoresi lmoresi merged commit ae64702 into development Jul 2, 2026
1 check passed
lmoresi added a commit that referenced this pull request Jul 2, 2026
 Copilot review

Rebased onto development (which now has #290 + the merged boundary_flux primitive, #294).
The rotated σ_nn recovery is a rotated-frame reading of the same Consistent Boundary Flux,
so collapse the duplication and answer the Copilot review:

- Copilot pyx:7597 (nonlinear/preamble): the rotated path runs the standard pre-solve
  preamble (DM time, auxiliary vector, _update_constants) before delegating, and now
  GUARDS the unsupported cases — it is a single LINEAR solve, so picard!=0 / warm-start
  raise NotImplementedError rather than silently returning a single-linearisation answer.
- Copilot rotated_bc:65 & :577 (boundary label): use the consolidated "UW_Boundaries"
  label via the shared boundary_flux._boundary_stratum_is (survives mesh adaptation; clear
  error for unknown names) — the per-boundary label lookups are gone.
- Unification (my own review note): delete rotated_bc._recover_sigma_nn_2d and reuse
  boundary_flux._desmear (σ_nn = de-smear of −reaction, mean-removed); the field hand-off
  reuses boundary_flux.write_boundary_scalar_field. Net −133 lines in rotated_bc.

_desmear gains partial_reaction: rotated passes False (its reaction is the ASSEMBLED
operator Q(A·u−b), complete at every node → OVERWRITE across ranks) vs boundary_flux's
default True (raw per-rank residual → SUM). This is the fix for the np=4 double-count.

Verified: test_1018 serial 6/6; test_1064 parallel 5/5 at -n 2 and -n 4.

Underworld development team with AI support from Claude Code
lmoresi added a commit that referenced this pull request Jul 2, 2026
…e-surface hand-off (#293)

* feat(stokes): rotated strong free-slip BC with σ_nn reaction (increment 1)

Add `add_rotated_freeslip_bc(boundary, normal=None)` on the Stokes saddle: rotate
the boundary velocity DOFs into a per-node (normal, tangential) frame and impose
the rotated normal component as an exact Dirichlet constraint (v·n̂=0 to machine
precision), with the constraint reaction giving the consistent boundary normal
traction σ_nn (`boundary_normal_traction`) — no augmented-Lagrangian splitting.

New module `utilities/rotated_bc.py` (pure Python; solve() delegates when rotated
BCs are registered, no-op otherwise):
- dimension-general SVD constraint frame: a node's accumulated face normals span a
  rank-r subspace; the r rows spanning it are constrained, the (dim-r) tangential
  complement is free. Handles 2D face/corner, 3D face/edge/corner, and regional
  spherical boxes (curved caps + planar sides + their edges/corners).
- per-boundary normal source: geometric facet normal (computeCellGeometryFVM,
  2D+3D) or an analytic sympy normal (exact radial X/|X| on caps) or a constant.
- rigid-rotation gauge auto-removed only when it is a genuine null space of the
  constrained problem (circular/spherical free-slip), never on straight walls.

Validated (test_1018): box rotated free-slip on 4 walls reproduces native
essential free-slip to 4e-6; annulus per-node radial free-slip gives machine-zero
radial leakage with the gauge removed. Direct LU solve here; geometric-FMG path
(the two custom_mg velocity-block injections) is the next increment.

Also fixes the stale set_custom_mg docstring (documents the Stokes velocity-block
path, which is wired via inject_custom_mg).

Underworld development team with AI support from Claude Code

* feat(stokes): rotated free-slip solved by geometric FMG (iterative by default)

Replace the direct-LU rotated solve with a self-contained fieldsplit-Schur KSP on
the rotated operator (LU is almost never the right option; kept only behind
solver._rotated_use_lu). A plain rotated Mat carries no DM field info, so UW3's
DM-coupled fieldsplit cannot split it — the split is built from explicit
velocity/pressure index sets, and the velocity sub-PC is driven by:
- geometric FMG on the custom (PR#290) prolongation, rotated P̂ = Q_v·P via
  setMGInterpolation (needs no DM), when a hierarchy is registered
  (set_custom_fmg); the rotated block A_vv = Q_v A_vv Q_vᵀ is formed from the
  rotated operator automatically and only the FINE prolongation is rotated
  (Galerkin coarse operators auto-correct);
- GAMG otherwise.
selfp Schur + jacobi pressure PC → the SolCx velocity block converges in ~6 outer
iterations. The coupled Stokes null space (constant pressure ⊕ rotated rigid
rotation) is attached to the rotated operator.

Two fixes so the strong BC is EXACT under an iterative solve (independent of
tolerance):
- zero the RHS at constrained rows (zeroRowsColumns leaves b untouched, so a
  nonzero b there leaks straight into the solution);
- zero the (fully decoupled) constrained DOFs post-solve — an identity constraint
  row only drives its residual below tolerance in a Krylov solve, so the DOF is
  ~tol, not 0, until pinned exactly here.
Null-space vectors are also zeroed at the constrained rows for compatibility.

test_1018 gains a geometric-FMG velocity-block case; annulus leakage is now
machine-zero at the default (loose) tolerance. No regressions in the custom-mg
Stokes suite.

Underworld development team with AI support from Claude Code

* fix(rotated-bc): parallel-safety pass (distributed Q, collective-safe, guards)

Move the rotated free-slip construction off serial-only constructs so it no
longer segfaults at np>1:
- build_rotation now assembles a DISTRIBUTED PETSc Mat with the operator's row
  layout, each rank setting only its owned rows (the per-node dim×dim block is
  node-local — a node's velocity components live on one DMPlex point owned by one
  rank — so Q is block-diagonal with no off-rank columns), instead of a global
  scipy matrix replicated on every rank;
- assemble the Jacobian before build_rotation so its parallel layout is final;
- post-solve zeroing of the constrained DOFs and the null-space-vector zeroing use
  ownership-relative local indices, not global indices into a local array;
- the rigid-rotation gauge is removed on the GLOBAL vector with PETSc dots (a local
  nodal sum double-counts shared nodes); new _rigid_rotation_global helper;
- _rotation_is_nullspace uses collective PETSc norms and no longer early-returns on
  a per-rank empty normal_rows (that desynced the collective and could deadlock);
- guard dm.getStratumIS against a NULL IS on ranks that own no part of a boundary
  (the actual np>1 segfault: getIndices() on a null IS);
- evaluate an analytic (sympy) boundary normal via a single lambdify, not per-node
  .subs() (orders of magnitude faster; also removes a per-rank serialisation).

Serial (np=1) behaviour unchanged: test_1018 still passes. The core rotated solve
runs correctly and fast at np=2 (verified by staged bisection). A hang in the full
solve() wrapper at np>1 is still under investigation and tracked separately.

Underworld development team with AI support from Claude Code

* fix(rotated-bc): parallel np>1 wrapper "hang" was a global-into-local index crash

The rotated free-slip s.solve() appeared to hang at np=2/4 (one rank ~100% CPU,
the other idle). It was not a hang: the RHS constrained-row zeroing indexed the
LOCAL vector slice with GLOBAL row indices,

    ba = bhat.getArray(); ba[normal_rows] = 0.0     # normal_rows are GLOBAL

which is in-range on rank 0 (ownership starts at 0, so global == local) but
overflows on every other rank (ownership starts at rstart > 0). Rank 1 therefore
raised an IndexError and left the collective while rank 0 proceeded into the
iterative solve and blocked in the next collective (getGlobalVec) — the classic
asymmetric-crash-that-looks-like-a-hang. This is the same global-into-local bug
class already fixed for the post-solve zeroing; line 222 (and the LU-path pin
write) were missed.

Fixes, in rotated_bc.py:
- solve_rotated_freeslip: zero bhat at the constrained rows using ownership-
  relative local indices (g - rstart), matching the post-solve zeroing; same for
  the opt-in LU pressure-pin write.
- _solve_rotated_iterative: give each solve a UNIQUE PETSc options prefix and drop
  the keys afterwards, so sequential rotated solves (multiple solvers / time steps)
  do not share or accumulate global-options state.
- solve_rotated_freeslip: after the field scatter, refresh the enhanced-variable
  gvec cache and clear canonical-data / mark the mesh lvec stale, mirroring the
  normal solve so downstream reads (var.data/array, checkpoint, stats) aren't stale.

Validated np={1,2,4}: box velocity L2 matches serial to ~1e-10, annulus velocity L2
to ~1e-8 (iterative-solver tolerance), radial leakage byte-identical. New
tests/parallel/test_1064_rotated_freeslip_parallel.py (box + annulus, MPI-safe
Integral/BdIntegral diagnostics) passes at -n 2 and -n 4; serial test_1018 still 3/3.

Underworld development team with AI support from Claude Code

* test(rotated-bc): cover custom-FMG rotated annulus free-slip in parallel

Add test_rotated_freeslip_annulus_fmg_partition_independent to the parallel suite:
rotated radial free-slip on a nested annulus hierarchy (coarse -> refine -> refine)
whose velocity block is CUSTOM GEOMETRIC FMG via set_custom_fmg (no GAMG, no direct
solve). Asserts the solve converges and that velocity L2 + radial leakage reproduce
the serial reference in parallel. Verified: 3 passed at -n 2 and -n 4.

This closes the coverage gap on the full stack FMG x rotated x annulus x np>1
(previously only GAMG rotated box/annulus and serial FMG box were tested).

Underworld development team with AI support from Claude Code

* feat(rotated-bc): parallel-safe, corner-correct sigma_nn (CBF topography)

boundary_normal_traction was serial-only and rang at corners. Two changes:

1. Corner-correct recovery. Read the CARTESIAN nodal reaction r_c = A u - b and use
   R_i = n_hat_i . r_c(node_i) as the sigma_nn nodal load, instead of the rotated
   frame's "normal row". At a node shared with another rotated-free-slip boundary the
   rotated frame's first SVD row is a MIX of both walls' normals, so reading it gave a
   wrong stress component and large corner spikes; n_hat . r_c is the true normal
   traction for this boundary. Whole-boundary SolCx relL2 vs analytic sigma_yy drops
   from ~0.43 to 0.042 at res48 (interior unchanged ~0.046), converging to 0.030 at
   res96, corr 0.999 -- matching the standalone strong-Dirichlet reaction method.

2. Parallel safety. The reaction is scattered to a local vector (ghosts included) and
   read by LOCAL section offset (no global-into-local overflow -- the same crash class
   as the solve-path hang). The consistent P2 line mass is assembled GLOBALLY by a
   coordinate-keyed allgather of the boundary elements (shared facets de-duplicated),
   so every rank solves the identical 1D system and the mean-removal gauge is global;
   each rank returns sigma_nn at its own local nodes. Verified partition-independent:
   res48 relL2/corr byte-identical at np=1/2/4.

Underworld development team with AI support from Claude Code

* test(rotated-bc): cover sigma_nn recovery accuracy (serial) + partition-independence

- test_1018: sigma_nn from boundary_normal_traction on the SolCx top boundary matches
  the analytic sigma_yy (whole boundary relL2 < 0.08, corr > 0.99) -- guards the
  Cartesian-reaction + n_hat projection (corner-correct) + consistent P2 line mass.
- test_1064: the same recovery is partition-independent -- whole-boundary relL2/|corr|
  (gathered + de-duplicated + broadcast) match the serial reference at np=2 and np=4,
  and stay accurate. Guards the parallel reaction read + allgather'd consistent mass.

Verified: test_1018 4 passed serial; test_1064 4 passed at -n 2 and -n 4.

Underworld development team with AI support from Claude Code

* feat(rotated-bc): lumped-mass sigma_nn de-smear (monotone, no Gibbs overshoot)

The consistent P2 line mass overshoots where the boundary traction jumps (e.g. across
a viscosity contrast) — a Gibbs wiggle that, for a free surface, injects a spurious
surface-velocity pulse. The lever is mass LUMPING, not element order: the lumped
(diagonal, row-sum) boundary mass is an M-matrix, so its de-smear cannot overshoot.

boundary_normal_traction gains mass={"lumped"(default),"consistent"}:
- "lumped": sigma = -R / m_lumped (m_lumped = row sums h*[1/6,2/3,1/6] of the P2 line
  mass). Monotone at discontinuities, a purely local division (no global mass solve),
  and marginally MORE accurate on SolCx than consistent (whole-boundary relL2 0.0399 vs
  0.0419 at res48, converging 0.0555 -> 0.0399 -> 0.0284 at res 24/48/96).
- "consistent": the previous full-mass solve, kept for smooth tractions.

Prototype (P1-consistent/P1-lumped/P2-lumped) showed P2-lumped is best: it keeps the P2
node density AND is monotone. Assembled from the same coordinate-keyed element allgather,
so both modes stay partition-independent (byte-identical np=1/2/4).

Tests: test_1018 adds a total-variation no-overshoot guard (lumped TV ~ analytic TV, and
< consistent TV); test_1064 golden updated to the lumped default. test_1018 5/5 serial;
test_1064 4/4 at -n 2 and -n 4.

Underworld development team with AI support from Claude Code

* feat(rotated-bc): dynamic_topography surface field — free-surface hand-off

Expose the rotated-free-slip sigma_nn to the free-surface machinery as a MeshVariable.
solver.dynamic_topography(boundary, field, buoyancy_scale=1, mass="lumped") writes
h = -(sigma_nn - mean)/(rho g) onto a scalar surface field (P1 recommended) at the
boundary nodes, from the last solve's constraint reaction. The 3-number topography
integrator drives node motion from a surface field, so this is the form it consumes:
the field is usable symbolically (BdIntegral) and the interior nodes are untouched.

Parallel gotcha fixed: the field must be written ONCE from a local numpy copy, not
per-node. A per-element write to var.data fires the variable write-callback each time,
and the number of boundary nodes differs per rank (a rank may own none of the boundary),
so per-element writes desync the callback's collective and deadlock. Verified: the field
BdIntegral over the boundary is byte-identical at np=1/2/4.

Tests: test_1018 (field reproduces analytic SolCx topography at the top vertices,
corr>0.99, and is BdIntegral-usable); test_1064 (BdIntegral L2 partition-independent).
test_1018 6/6 serial; test_1064 5/5 at -n 2 and -n 4.

Underworld development team with AI support from Claude Code

* refactor(rotated-bc): unify σ_nn onto shared boundary_flux; address #293 Copilot review

Rebased onto development (which now has #290 + the merged boundary_flux primitive, #294).
The rotated σ_nn recovery is a rotated-frame reading of the same Consistent Boundary Flux,
so collapse the duplication and answer the Copilot review:

- Copilot pyx:7597 (nonlinear/preamble): the rotated path runs the standard pre-solve
  preamble (DM time, auxiliary vector, _update_constants) before delegating, and now
  GUARDS the unsupported cases — it is a single LINEAR solve, so picard!=0 / warm-start
  raise NotImplementedError rather than silently returning a single-linearisation answer.
- Copilot rotated_bc:65 & :577 (boundary label): use the consolidated "UW_Boundaries"
  label via the shared boundary_flux._boundary_stratum_is (survives mesh adaptation; clear
  error for unknown names) — the per-boundary label lookups are gone.
- Unification (my own review note): delete rotated_bc._recover_sigma_nn_2d and reuse
  boundary_flux._desmear (σ_nn = de-smear of −reaction, mean-removed); the field hand-off
  reuses boundary_flux.write_boundary_scalar_field. Net −133 lines in rotated_bc.

_desmear gains partial_reaction: rotated passes False (its reaction is the ASSEMBLED
operator Q(A·u−b), complete at every node → OVERWRITE across ranks) vs boundary_flux's
default True (raw per-rank residual → SUM). This is the fix for the np=4 double-count.

Verified: test_1018 serial 6/6; test_1064 parallel 5/5 at -n 2 and -n 4.

Underworld development team with AI support from Claude Code

* fix(rotated-bc): address #293 re-review — DM vector-pool leaks + stale docstring

Copilot re-review (7 comments) — resource-management + docs:

- DM getGlobalVec() pool leaks: vectors that PERSIST beyond solve_rotated_freeslip
  (returned in the info dict: U, the LU-branch Uhat; the pressure null-space vector pv)
  now use dm.createGlobalVec(); the borrowed temporaries (U0, F0) are returned with
  restoreGlobalVec(); the transient rigid-rotation vector from _rigid_rotation_global is
  restored at every call site (gauge removal, _rotated_nullspace, _rotation_is_nullspace),
  and the transient duplicates in _rotation_is_nullspace are destroyed. Prevents pool
  exhaustion / memory growth over repeated (time-stepping) solves.
- Module docstring corrected: the solve is iterative fieldsplit-Schur (custom-FMG / GAMG)
  by default with LU opt-in, and σ_nn reuses the shared boundary_flux de-smear — not the
  "direct LU, FMG later" of the original increment-1 note.

Deferred (noted on the PR): building Q's identity via a per-row Mat.setValue loop is a
one-time-per-solve setup that only matters at very large scale; left as a follow-up to
avoid perturbing the well-tested Q construction.

Verified: test_1018 serial 6/6; test_1064 parallel 5/5 at -n 2 and -n 4.

Underworld development team with AI support from Claude Code
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants