General boundary_flux primitive: Consistent Boundary Flux for any solver (heat flux / Nusselt, traction) by lmoresi · Pull Request #294 · underworldcode/underworld3

lmoresi · 2026-07-01T21:49:55Z

What

This set of functionality goes hand in hand with the rotated bc toolkit. Once we have exact boundary conditions, we can use the consistent boundary flux methods to obtain (wait for it) consistent fluxes at the boundaries. This is the most accurate was to determine topography, heat flux, etc. It is not compatible with the penalty approaches (very low accuracy).

Adds solver.boundary_flux(boundary, mass="lumped", remove_mean=False, normal=None) and
solver.boundary_flux_field(boundary, field, scale=1.0, ...) to SolverBaseClass, so the
Consistent Boundary Flux (CBF, Gresho et al.) recovery is a general capability of any
solver, not a Stokes-only / topography-specific one.

The essential-BC reaction — the assembled interior FEM residual at a Dirichlet node — is the
consistent nodal flux; de-smearing it with the boundary mass gives a pointwise surface flux:

scalar diffusion / advection-diffusion → surface heat flux -k ∂T/∂n (boundary mean = Nusselt number)
Stokes → boundary traction σ·n̂ (σ_nn with normal)

Design

SolverBaseClass._assemble_volume_reaction(): field-structure-agnostic (Stokes self.fields,
else the single Unknowns.u) volume FEM residual (PETSc's rock-solid volume integration).
utilities/boundary_flux.py: the equation-independent parts — boundary-node gathering, the
lumped/consistent boundary-mass de-smear (remove_mean flag: keep the mean for a physical flux
= Nusselt, remove it for a gauge-free field = dynamic topography), and the field write.

boundary_flux is the primitive; boundary_normal_traction / dynamic_topography (on the
rotated-free-slip branch #293) become thin readings of it. remove_mean defaults False — the
mean flux is physical.

Parallel correctness (the hard part)

A partition that CUTS the flux boundary needs the nodal reaction assembled across ranks. Rather
than hand-roll localToGlobal(ADD) (which was wrong at cut nodes), the DM has overlap=0 so PETSc's
volume residual gives each rank its OWNED-cell partial; the complete reaction is assembled by
summing the partials across ranks by coordinate — the same rock-solid gather used for the
boundary mass. Verified against the rock-solid volume-integral identity
total_flux = ∫_Ω ∇T·∇w, which is byte-identical at np=1/2/4.

Validation

Harmonic Poisson with a known analytic flux: serial corr 1.0000, relL2 2e-4 (converging
8e-4→2e-4→4e-4 at res 24/48/96), mean flux = Nusselt 2/sinh π; BdIntegral of the flux field
byte-identical np=1/2/4.

tests/test_1019_boundary_flux.py (serial)
tests/parallel/test_1065_boundary_flux_parallel.py — passes at -n 2 and -n 4 (boundary cut)

Follow-up

Once this lands, #293 rebases boundary_normal_traction / dynamic_topography onto boundary_flux.

Underworld development team with AI support from Claude Code

…lux) Add solver.boundary_flux(boundary, mass="lumped", remove_mean=False, normal=None) and solver.boundary_flux_field(...) to SolverBaseClass, so the Consistent Boundary Flux recovery is a general capability of any solver, not a Stokes-only / topography-specific one. The essential-BC reaction (the assembled interior FEM residual at a Dirichlet node) is the consistent nodal flux; de-smearing it with the boundary mass gives a pointwise surface flux: - scalar diffusion / advection-diffusion -> surface heat flux -k dT/dn (mean = Nusselt) - Stokes -> boundary traction sigma.n (sigma_nn) New pieces: - SolverBaseClass._assemble_volume_reaction(): field-structure-agnostic (self.fields for Stokes, else the single Unknowns.u), globally assembled volume FEM residual. - utilities/boundary_flux.py: equation-independent boundary-node gathering, the lumped/ consistent boundary-mass de-smear (remove_mean flag: keep the mean for a physical flux = Nusselt, remove it for a gauge-free field = dynamic topography), and the field write. Naming: boundary_flux is the primitive; boundary_normal_traction / dynamic_topography (on the rotated-free-slip branch) become thin readings of it (traction; mean-removed topography). remove_mean defaults False here — the mean flux is physical. Validated on a harmonic Poisson (analytic flux): serial corr 1.0000, relL2 2e-4, converging 0.0008->0.0002->0.0004 at res 24/48/96; mean flux == Nusselt (2/sinh pi); np=2 machine-identical. test_1019 (serial). KNOWN LIMITATION (inline TODO): a partition that CUTS a flux boundary (np=4 on a box) has a ~few-% localized error at the cut node — the parallel assembly of the volume FEM residual there is not yet exact. The solve is partition-independent (verified to 1e-12); serial + boundary-uncut partitions are machine-exact. Parallel test gated to np=2 until the cut-node assembly is fixed. Not yet PR'd for that reason. Underworld development team with AI support from Claude Code

…ut boundary nodes The boundary flux was wrong (~7% localized) when a partition CUT the flux boundary (np=4 on a box). Root cause: _assemble_volume_reaction hand-rolled the assembly with localToGlobal(ADD) on the DMPlexSNESComputeResidualFEM output, which does not reproduce the true reaction at a cut node. Confirmed against the rock-solid VOLUME integral identity total_flux = integral(grad T . grad w): that is byte-identical at np=1/2/4, while the ADD-assembled reaction diverged at np=4. Fix (no hand-rolled global assembly): the DM has overlap=0, so PETSc's volume residual gives each rank only its OWNED cells' partial contribution at a shared boundary node. Return the raw local residual and assemble the complete reaction by SUMMING each rank's partial across ranks by coordinate — the same rock-solid coordinate gather already used for the boundary mass. This reproduces the volume integral exactly at cut nodes. Validated (harmonic Poisson, analytic flux): surface heat flux corr 1.0000, relL2 2e-4, and BdIntegral of the flux field byte-identical at np=1/2/4. New parallel test test_1065 passes at -n 2 AND -n 4 (boundary cut); serial test_1019 still passes. Underworld development team with AI support from Claude Code

Copilot

Pull request overview

Adds a solver-agnostic Consistent Boundary Flux (CBF) primitive to Underworld3 so any solver can recover boundary flux/traction from essential-BC reactions, with serial + MPI regression tests to validate accuracy and partition-independence.

Changes:

Introduces SolverBaseClass.boundary_flux() and SolverBaseClass.boundary_flux_field() backed by a new _assemble_volume_reaction() helper.
Adds equation-independent boundary-node gathering + mass de-smearing + field hand-off in utilities/boundary_flux.py.
Adds serial and parallel tests validating heat-flux recovery accuracy and MPI partition-independence.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 9 comments.

File	Description
tests/test_1019_boundary_flux.py	Serial accuracy test for scalar diffusion heat flux + mean flux preservation (Nusselt).
tests/parallel/test_1065_boundary_flux_parallel.py	MPI regression to ensure partition-independent flux recovery when the boundary is cut by the partition.
src/underworld3/utilities/boundary_flux.py	New utility implementing boundary node extraction, (lumped/consistent) boundary-mass de-smear, and field write-back.
src/underworld3/cython/petsc_generic_snes_solvers.pyx	Adds `_assemble_volume_reaction()` plus new public solver methods `boundary_flux` / `boundary_flux_field`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    dim = solver.mesh.dim
+    xs, flux = boundary_flux(solver, boundary, mass=mass, remove_mean=remove_mean, normal=normal)
+    fmap = {_key(x, dim): scale * float(f) for x, f in zip(np.asarray(xs), np.asarray(flux).ravel())}


+This module holds the parts that do not depend on the equation: the boundary-node
+gathering, the boundary-mass de-smear (lumped / consistent), and the field hand-off.
+The equation-specific bit — extracting the nodal reaction — is the solver method
+``_assemble_volume_reaction`` (globally assembled, so shared boundary nodes are complete).
+


+    else:
+        M = np.zeros((n, n))
+        Me = np.array([[4., 2, -1], [2, 16, 2], [-1, 2, 4]])
+        for (ka, km, kb), h in uniq.items():
+            tri = [gi[ka], gi[km], gi[kb]]; Mh = (h / 30.0) * Me
+            for ii in range(3):
+                for jj in range(3):
+                    M[tri[ii], tri[jj]] += Mh[ii, jj]
+        sig = np.linalg.solve(M, Rg)


+        ~0. General across scalar / vector / Stokes solvers: the current solution is
+        gathered from ``self.fields`` when present (Stokes) else the single
+        ``Unknowns.u`` field, and the per-rank cell residual is summed across ranks
+        (``localToGlobal`` ADD) so boundary nodes shared across a partition cut carry the
+        complete flux.


+    """DMPlex points carrying `field_id` DOFs on `boundary`, with their coordinates.
+    Parallel-safe: a rank owning no part of the boundary gets a NULL stratum IS
+    (guarded); ghost nodes are included (their reaction is completed by the global
+    assembly in ``_assemble_volume_reaction``)."""


+    fS, fE = dm.getHeightStratum(1)
+    bval = [b.value for b in solver.mesh.boundaries if b.name == boundary][0]
+    sis = dm.getStratumIS(boundary, bval)


+        # accumulate area-weighted facet normals to the closure nodes
+        bval = [b.value for b in solver.mesh.boundaries if b.name == boundary][0]
+        sis = dm.getStratumIS(boundary, bval)


+    nodeR = {_key(x, dim): float(r) for x, r in zip(xs, R)}
+    bval = [b.value for b in solver.mesh.boundaries if b.name == boundary][0]
+    sis = dm.getStratumIS(boundary, bval)


+        """Globally-assembled FEM VOLUME residual (no boundary terms) in the DM-local
+        layout, as a numpy array.


…ail-fast, docs - Use the consolidated "UW_Boundaries" DM label (distinguished by value) for boundary strata instead of a per-boundary label named like the boundary. The named per-boundary labels do not survive mesh adaptation (only UW_Boundaries is rebuilt), so the old lookup could return an empty stratum and silently give empty flux on adapted meshes. New _boundary_stratum_is() helper; raises a clear error for an unknown boundary name (was an unhelpful IndexError). Fixes all three lookup sites. - boundary_flux_field: fail fast when the target is a scalar MeshVariable but the flux is a vector (a vector solver with normal=None), instead of silently pairing a flattened vector with the nodes. - Correct docstrings (module, _boundary_field_nodes, _assemble_volume_reaction) to describe the actual parallel path: the reaction is the RAW per-rank volume residual, assembled at cut nodes by coordinate-summing in _desmear (not a global localToGlobal). - Note the consistent-mass de-smear is a dense O(n^3) solve; lumped (default) is O(n). Serial test_1019 and parallel test_1065 (-n 4) still pass. Underworld development team with AI support from Claude Code

lmoresi · 2026-07-02T02:03:45Z

Updated after copilot reviews

Copilot review Rebased onto development (which now has #290 + the merged boundary_flux primitive, #294). The rotated σ_nn recovery is a rotated-frame reading of the same Consistent Boundary Flux, so collapse the duplication and answer the Copilot review: - Copilot pyx:7597 (nonlinear/preamble): the rotated path runs the standard pre-solve preamble (DM time, auxiliary vector, _update_constants) before delegating, and now GUARDS the unsupported cases — it is a single LINEAR solve, so picard!=0 / warm-start raise NotImplementedError rather than silently returning a single-linearisation answer. - Copilot rotated_bc:65 & :577 (boundary label): use the consolidated "UW_Boundaries" label via the shared boundary_flux._boundary_stratum_is (survives mesh adaptation; clear error for unknown names) — the per-boundary label lookups are gone. - Unification (my own review note): delete rotated_bc._recover_sigma_nn_2d and reuse boundary_flux._desmear (σ_nn = de-smear of −reaction, mean-removed); the field hand-off reuses boundary_flux.write_boundary_scalar_field. Net −133 lines in rotated_bc. _desmear gains partial_reaction: rotated passes False (its reaction is the ASSEMBLED operator Q(A·u−b), complete at every node → OVERWRITE across ranks) vs boundary_flux's default True (raw per-rank residual → SUM). This is the fix for the np=4 double-count. Verified: test_1018 serial 6/6; test_1064 parallel 5/5 at -n 2 and -n 4. Underworld development team with AI support from Claude Code

…e-surface hand-off (#293) * feat(stokes): rotated strong free-slip BC with σ_nn reaction (increment 1) Add `add_rotated_freeslip_bc(boundary, normal=None)` on the Stokes saddle: rotate the boundary velocity DOFs into a per-node (normal, tangential) frame and impose the rotated normal component as an exact Dirichlet constraint (v·n̂=0 to machine precision), with the constraint reaction giving the consistent boundary normal traction σ_nn (`boundary_normal_traction`) — no augmented-Lagrangian splitting. New module `utilities/rotated_bc.py` (pure Python; solve() delegates when rotated BCs are registered, no-op otherwise): - dimension-general SVD constraint frame: a node's accumulated face normals span a rank-r subspace; the r rows spanning it are constrained, the (dim-r) tangential complement is free. Handles 2D face/corner, 3D face/edge/corner, and regional spherical boxes (curved caps + planar sides + their edges/corners). - per-boundary normal source: geometric facet normal (computeCellGeometryFVM, 2D+3D) or an analytic sympy normal (exact radial X/|X| on caps) or a constant. - rigid-rotation gauge auto-removed only when it is a genuine null space of the constrained problem (circular/spherical free-slip), never on straight walls. Validated (test_1018): box rotated free-slip on 4 walls reproduces native essential free-slip to 4e-6; annulus per-node radial free-slip gives machine-zero radial leakage with the gauge removed. Direct LU solve here; geometric-FMG path (the two custom_mg velocity-block injections) is the next increment. Also fixes the stale set_custom_mg docstring (documents the Stokes velocity-block path, which is wired via inject_custom_mg). Underworld development team with AI support from Claude Code * feat(stokes): rotated free-slip solved by geometric FMG (iterative by default) Replace the direct-LU rotated solve with a self-contained fieldsplit-Schur KSP on the rotated operator (LU is almost never the right option; kept only behind solver._rotated_use_lu). A plain rotated Mat carries no DM field info, so UW3's DM-coupled fieldsplit cannot split it — the split is built from explicit velocity/pressure index sets, and the velocity sub-PC is driven by: - geometric FMG on the custom (PR#290) prolongation, rotated P̂ = Q_v·P via setMGInterpolation (needs no DM), when a hierarchy is registered (set_custom_fmg); the rotated block A_vv = Q_v A_vv Q_vᵀ is formed from the rotated operator automatically and only the FINE prolongation is rotated (Galerkin coarse operators auto-correct); - GAMG otherwise. selfp Schur + jacobi pressure PC → the SolCx velocity block converges in ~6 outer iterations. The coupled Stokes null space (constant pressure ⊕ rotated rigid rotation) is attached to the rotated operator. Two fixes so the strong BC is EXACT under an iterative solve (independent of tolerance): - zero the RHS at constrained rows (zeroRowsColumns leaves b untouched, so a nonzero b there leaks straight into the solution); - zero the (fully decoupled) constrained DOFs post-solve — an identity constraint row only drives its residual below tolerance in a Krylov solve, so the DOF is ~tol, not 0, until pinned exactly here. Null-space vectors are also zeroed at the constrained rows for compatibility. test_1018 gains a geometric-FMG velocity-block case; annulus leakage is now machine-zero at the default (loose) tolerance. No regressions in the custom-mg Stokes suite. Underworld development team with AI support from Claude Code * fix(rotated-bc): parallel-safety pass (distributed Q, collective-safe, guards) Move the rotated free-slip construction off serial-only constructs so it no longer segfaults at np>1: - build_rotation now assembles a DISTRIBUTED PETSc Mat with the operator's row layout, each rank setting only its owned rows (the per-node dim×dim block is node-local — a node's velocity components live on one DMPlex point owned by one rank — so Q is block-diagonal with no off-rank columns), instead of a global scipy matrix replicated on every rank; - assemble the Jacobian before build_rotation so its parallel layout is final; - post-solve zeroing of the constrained DOFs and the null-space-vector zeroing use ownership-relative local indices, not global indices into a local array; - the rigid-rotation gauge is removed on the GLOBAL vector with PETSc dots (a local nodal sum double-counts shared nodes); new _rigid_rotation_global helper; - _rotation_is_nullspace uses collective PETSc norms and no longer early-returns on a per-rank empty normal_rows (that desynced the collective and could deadlock); - guard dm.getStratumIS against a NULL IS on ranks that own no part of a boundary (the actual np>1 segfault: getIndices() on a null IS); - evaluate an analytic (sympy) boundary normal via a single lambdify, not per-node .subs() (orders of magnitude faster; also removes a per-rank serialisation). Serial (np=1) behaviour unchanged: test_1018 still passes. The core rotated solve runs correctly and fast at np=2 (verified by staged bisection). A hang in the full solve() wrapper at np>1 is still under investigation and tracked separately. Underworld development team with AI support from Claude Code * fix(rotated-bc): parallel np>1 wrapper "hang" was a global-into-local index crash The rotated free-slip s.solve() appeared to hang at np=2/4 (one rank ~100% CPU, the other idle). It was not a hang: the RHS constrained-row zeroing indexed the LOCAL vector slice with GLOBAL row indices, ba = bhat.getArray(); ba[normal_rows] = 0.0 # normal_rows are GLOBAL which is in-range on rank 0 (ownership starts at 0, so global == local) but overflows on every other rank (ownership starts at rstart > 0). Rank 1 therefore raised an IndexError and left the collective while rank 0 proceeded into the iterative solve and blocked in the next collective (getGlobalVec) — the classic asymmetric-crash-that-looks-like-a-hang. This is the same global-into-local bug class already fixed for the post-solve zeroing; line 222 (and the LU-path pin write) were missed. Fixes, in rotated_bc.py: - solve_rotated_freeslip: zero bhat at the constrained rows using ownership- relative local indices (g - rstart), matching the post-solve zeroing; same for the opt-in LU pressure-pin write. - _solve_rotated_iterative: give each solve a UNIQUE PETSc options prefix and drop the keys afterwards, so sequential rotated solves (multiple solvers / time steps) do not share or accumulate global-options state. - solve_rotated_freeslip: after the field scatter, refresh the enhanced-variable gvec cache and clear canonical-data / mark the mesh lvec stale, mirroring the normal solve so downstream reads (var.data/array, checkpoint, stats) aren't stale. Validated np={1,2,4}: box velocity L2 matches serial to ~1e-10, annulus velocity L2 to ~1e-8 (iterative-solver tolerance), radial leakage byte-identical. New tests/parallel/test_1064_rotated_freeslip_parallel.py (box + annulus, MPI-safe Integral/BdIntegral diagnostics) passes at -n 2 and -n 4; serial test_1018 still 3/3. Underworld development team with AI support from Claude Code * test(rotated-bc): cover custom-FMG rotated annulus free-slip in parallel Add test_rotated_freeslip_annulus_fmg_partition_independent to the parallel suite: rotated radial free-slip on a nested annulus hierarchy (coarse -> refine -> refine) whose velocity block is CUSTOM GEOMETRIC FMG via set_custom_fmg (no GAMG, no direct solve). Asserts the solve converges and that velocity L2 + radial leakage reproduce the serial reference in parallel. Verified: 3 passed at -n 2 and -n 4. This closes the coverage gap on the full stack FMG x rotated x annulus x np>1 (previously only GAMG rotated box/annulus and serial FMG box were tested). Underworld development team with AI support from Claude Code * feat(rotated-bc): parallel-safe, corner-correct sigma_nn (CBF topography) boundary_normal_traction was serial-only and rang at corners. Two changes: 1. Corner-correct recovery. Read the CARTESIAN nodal reaction r_c = A u - b and use R_i = n_hat_i . r_c(node_i) as the sigma_nn nodal load, instead of the rotated frame's "normal row". At a node shared with another rotated-free-slip boundary the rotated frame's first SVD row is a MIX of both walls' normals, so reading it gave a wrong stress component and large corner spikes; n_hat . r_c is the true normal traction for this boundary. Whole-boundary SolCx relL2 vs analytic sigma_yy drops from ~0.43 to 0.042 at res48 (interior unchanged ~0.046), converging to 0.030 at res96, corr 0.999 -- matching the standalone strong-Dirichlet reaction method. 2. Parallel safety. The reaction is scattered to a local vector (ghosts included) and read by LOCAL section offset (no global-into-local overflow -- the same crash class as the solve-path hang). The consistent P2 line mass is assembled GLOBALLY by a coordinate-keyed allgather of the boundary elements (shared facets de-duplicated), so every rank solves the identical 1D system and the mean-removal gauge is global; each rank returns sigma_nn at its own local nodes. Verified partition-independent: res48 relL2/corr byte-identical at np=1/2/4. Underworld development team with AI support from Claude Code * test(rotated-bc): cover sigma_nn recovery accuracy (serial) + partition-independence - test_1018: sigma_nn from boundary_normal_traction on the SolCx top boundary matches the analytic sigma_yy (whole boundary relL2 < 0.08, corr > 0.99) -- guards the Cartesian-reaction + n_hat projection (corner-correct) + consistent P2 line mass. - test_1064: the same recovery is partition-independent -- whole-boundary relL2/|corr| (gathered + de-duplicated + broadcast) match the serial reference at np=2 and np=4, and stay accurate. Guards the parallel reaction read + allgather'd consistent mass. Verified: test_1018 4 passed serial; test_1064 4 passed at -n 2 and -n 4. Underworld development team with AI support from Claude Code * feat(rotated-bc): lumped-mass sigma_nn de-smear (monotone, no Gibbs overshoot) The consistent P2 line mass overshoots where the boundary traction jumps (e.g. across a viscosity contrast) — a Gibbs wiggle that, for a free surface, injects a spurious surface-velocity pulse. The lever is mass LUMPING, not element order: the lumped (diagonal, row-sum) boundary mass is an M-matrix, so its de-smear cannot overshoot. boundary_normal_traction gains mass={"lumped"(default),"consistent"}: - "lumped": sigma = -R / m_lumped (m_lumped = row sums h*[1/6,2/3,1/6] of the P2 line mass). Monotone at discontinuities, a purely local division (no global mass solve), and marginally MORE accurate on SolCx than consistent (whole-boundary relL2 0.0399 vs 0.0419 at res48, converging 0.0555 -> 0.0399 -> 0.0284 at res 24/48/96). - "consistent": the previous full-mass solve, kept for smooth tractions. Prototype (P1-consistent/P1-lumped/P2-lumped) showed P2-lumped is best: it keeps the P2 node density AND is monotone. Assembled from the same coordinate-keyed element allgather, so both modes stay partition-independent (byte-identical np=1/2/4). Tests: test_1018 adds a total-variation no-overshoot guard (lumped TV ~ analytic TV, and < consistent TV); test_1064 golden updated to the lumped default. test_1018 5/5 serial; test_1064 4/4 at -n 2 and -n 4. Underworld development team with AI support from Claude Code * feat(rotated-bc): dynamic_topography surface field — free-surface hand-off Expose the rotated-free-slip sigma_nn to the free-surface machinery as a MeshVariable. solver.dynamic_topography(boundary, field, buoyancy_scale=1, mass="lumped") writes h = -(sigma_nn - mean)/(rho g) onto a scalar surface field (P1 recommended) at the boundary nodes, from the last solve's constraint reaction. The 3-number topography integrator drives node motion from a surface field, so this is the form it consumes: the field is usable symbolically (BdIntegral) and the interior nodes are untouched. Parallel gotcha fixed: the field must be written ONCE from a local numpy copy, not per-node. A per-element write to var.data fires the variable write-callback each time, and the number of boundary nodes differs per rank (a rank may own none of the boundary), so per-element writes desync the callback's collective and deadlock. Verified: the field BdIntegral over the boundary is byte-identical at np=1/2/4. Tests: test_1018 (field reproduces analytic SolCx topography at the top vertices, corr>0.99, and is BdIntegral-usable); test_1064 (BdIntegral L2 partition-independent). test_1018 6/6 serial; test_1064 5/5 at -n 2 and -n 4. Underworld development team with AI support from Claude Code * refactor(rotated-bc): unify σ_nn onto shared boundary_flux; address #293 Copilot review Rebased onto development (which now has #290 + the merged boundary_flux primitive, #294). The rotated σ_nn recovery is a rotated-frame reading of the same Consistent Boundary Flux, so collapse the duplication and answer the Copilot review: - Copilot pyx:7597 (nonlinear/preamble): the rotated path runs the standard pre-solve preamble (DM time, auxiliary vector, _update_constants) before delegating, and now GUARDS the unsupported cases — it is a single LINEAR solve, so picard!=0 / warm-start raise NotImplementedError rather than silently returning a single-linearisation answer. - Copilot rotated_bc:65 & :577 (boundary label): use the consolidated "UW_Boundaries" label via the shared boundary_flux._boundary_stratum_is (survives mesh adaptation; clear error for unknown names) — the per-boundary label lookups are gone. - Unification (my own review note): delete rotated_bc._recover_sigma_nn_2d and reuse boundary_flux._desmear (σ_nn = de-smear of −reaction, mean-removed); the field hand-off reuses boundary_flux.write_boundary_scalar_field. Net −133 lines in rotated_bc. _desmear gains partial_reaction: rotated passes False (its reaction is the ASSEMBLED operator Q(A·u−b), complete at every node → OVERWRITE across ranks) vs boundary_flux's default True (raw per-rank residual → SUM). This is the fix for the np=4 double-count. Verified: test_1018 serial 6/6; test_1064 parallel 5/5 at -n 2 and -n 4. Underworld development team with AI support from Claude Code * fix(rotated-bc): address #293 re-review — DM vector-pool leaks + stale docstring Copilot re-review (7 comments) — resource-management + docs: - DM getGlobalVec() pool leaks: vectors that PERSIST beyond solve_rotated_freeslip (returned in the info dict: U, the LU-branch Uhat; the pressure null-space vector pv) now use dm.createGlobalVec(); the borrowed temporaries (U0, F0) are returned with restoreGlobalVec(); the transient rigid-rotation vector from _rigid_rotation_global is restored at every call site (gauge removal, _rotated_nullspace, _rotation_is_nullspace), and the transient duplicates in _rotation_is_nullspace are destroyed. Prevents pool exhaustion / memory growth over repeated (time-stepping) solves. - Module docstring corrected: the solve is iterative fieldsplit-Schur (custom-FMG / GAMG) by default with LU opt-in, and σ_nn reuses the shared boundary_flux de-smear — not the "direct LU, FMG later" of the original increment-1 note. Deferred (noted on the PR): building Q's identity via a per-row Mat.setValue loop is a one-time-per-solve setup that only matters at very large scale; left as a follow-up to avoid perturbing the well-tested Q construction. Verified: test_1018 serial 6/6; test_1064 parallel 5/5 at -n 2 and -n 4. Underworld development team with AI support from Claude Code

lmoresi added 2 commits July 1, 2026 22:01

Copilot AI review requested due to automatic review settings July 1, 2026 21:49

Copilot started reviewing on behalf of lmoresi July 1, 2026 21:50 View session

Copilot AI reviewed Jul 1, 2026

View reviewed changes

lmoresi merged commit ae64702 into development Jul 2, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

General boundary_flux primitive: Consistent Boundary Flux for any solver (heat flux / Nusselt, traction)#294

General boundary_flux primitive: Consistent Boundary Flux for any solver (heat flux / Nusselt, traction)#294
lmoresi merged 3 commits into
developmentfrom
feature/boundary-flux

lmoresi commented Jul 1, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

lmoresi commented Jul 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		"""Globally-assembled FEM VOLUME residual (no boundary terms) in the DM-local
		layout, as a numpy array.

Uh oh!

Conversation

lmoresi commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Design

Parallel correctness (the hard part)

Validation

Follow-up

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

lmoresi commented Jul 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lmoresi commented Jul 1, 2026 •

edited

Loading