Cross-Section Data Pipeline

Key Facts

Read this before modifying the cross-section pipeline.

  • 421-group microscopic XS from GENDF (GXS) files → HDF5 via data/micro_xs/

  • Isotope dataclass: sig_t, sig_c, sig_f, sig_el, sig_inel, nu, chi (421 groups)

  • Sigma-zero iteration: data/macro_xs/sigma_zeros.py — self-shielding

  • Mixture dataclass: macroscopic XS with SigS[l][g_from, g_to] convention

  • Consistency: \(\Sigma_t = \Sigma_c + \Sigma_f + \sum_{g'} \Sigma_{s,g \to g'}\)

  • load_isotope() auto-selects HDF5 or fallback .m parser

  • Verification uses synthetic XS from derivations/_xs_library.py (regions A/B/C/D), NOT this pipeline

Overview

Every solver in ORPHEUS relies on multi-group microscopic cross sections for the 12 nuclides in the 421-energy-group JEFF-3.1 library. This chapter documents the complete data pipeline from the authoritative IAEA source files to the internal Isotope dataclass:

  1. GENDF format — the IAEA distributes processed nuclear data in the GENDF (Groupwise ENDF) format, which uses fixed-width 80-column records inherited from punched-card conventions.

  2. Parsing — the gendf.py module reads GENDF files directly, bypassing the MATLAB CSV intermediary.

  3. Scattering assembly — elastic, inelastic, and thermal scattering matrices are combined with careful treatment of thermal-group boundaries.

  4. HDF5 serialisation — the parsed data is stored in compressed HDF5 files for fast loading at runtime.

  5. Loadingload_isotope() provides a uniform API that auto-selects the HDF5 backend when available.

The 12 nuclides in the library are:

Nuclide

GXS File

Temperatures (K)

Sigma-zeros

H-1

H_001.GXS

294, 350, 400, 450, 500, 550, 600, 650

1

O-16

O_016.GXS

294, 600, 900, 1200, 1500, 1800

6

B-10

B_010.GXS

294, 600, 900, 1200

4

B-11

B_011.GXS

294, 600, 900, 1200

4

Na-23

NA023.GXS

294, 600, 900, 1200

4

U-235

U_235.GXS

294, 600, 900, 1200, 1500, 1800

10

U-238

U_238.GXS

294, 600, 900, 1200, 1500, 1800

10

Zr-90

ZR090.GXS

294, 600, 900, 1200

4

Zr-91

ZR091.GXS

294, 600, 900, 1200

4

Zr-92

ZR092.GXS

294, 600, 900, 1200

4

Zr-94

ZR094.GXS

294, 600, 900, 1200

4

Zr-96

ZR096.GXS

294, 600, 900, 1200

4

The GENDF Format

Source

The GENDF (Groupwise ENDF) files are obtained from the IAEA Nuclear Data Services:

These are the 421-group JEFF-3.1 processed nuclear data files. Each file contains all reaction cross sections and transfer matrices for one nuclide at multiple temperatures.

Record Layout

Every line in a GENDF file is exactly 80 characters wide, following the ENDF-6 format [ENDF102] inherited from the punched-card era:

Columns  1-66: 6 data fields, 11 characters each
Columns 67-70: MAT number (material identifier)
Columns 71-72: MF  number (file type)
Columns 73-75: MT  number (reaction type)
Columns 76-80: line sequence number

For example, the first data record of H-1 looks like:

 1.001000+3 9.991673-1          0          1         -1          1 125 1451    1
|----11----|----11----|----11----|----11----|----11----|----11----| MAT|MF|MT |SEQ|

Compact Float Notation

Data fields use a compact Fortran notation where the E in scientific notation is omitted. The exponent sign immediately follows the mantissa:

1.001000+3  →  1.001000E+3  =  1001.0
9.991673-1  →  9.991673E-1  =  0.9991673
2.407191-7  →  2.407191E-7  =  2.407191×10⁻⁷

The parser _parse_gendf_field in gendf.py handles this by inserting E before any + or - sign that follows a digit:

s = re.sub(r"(\d)([+-])", r"\1E\2", s)
return float(s)

MF and MT Numbers

The MF (file) and MT (reaction) numbers identify the type of data in each section:

MF=1 — General information:

MT

Content

451

Header: temperatures, sigma-zero base points, energy group boundaries (422 values for 421 groups)

MF=3 — Cross sections (sigma-zero dependent, one value per group):

MT

Reaction

1

Total cross section (does not include upscattering)

18

Fission

102

Radiative capture \((n,\gamma)\)

107

\((n,\alpha)\)

452

Total \(\bar{\nu}\) (average neutrons per fission)

MF=6 — Transfer matrices (group-to-group scattering):

MT

Reaction

2

Elastic scattering (sigma-zero dependent)

16

\((n,2n)\) reaction

18

Fission spectrum \(\chi(g)\)

51–91

Discrete inelastic scattering levels

221

Free-gas thermal scattering

222

Thermal scattering for H bound in water (\(S(\alpha,\beta)\))

MF=3 Record Structure

Each MF=3 section begins with a header record followed by per-group data records. The structure for a section with \(N_\ell\) Legendre components and \(N_{\sigma_0}\) sigma-zero values is:

Record 1 (section header):
  [ZA, AWR, NL, N_sig0, LRFLAG, NG, MAT, MF, MT, 1]

For each group g = 1, ..., NG:
  Record (group header):
    [TEMP, 0, NL, N_sig0, NW, IG, MAT, MF, MT, line]
  Record(s) (data):
    NW = 2 × NL × N_sig0 words packed 6 per line

The first half of the NW words contains flux weights; the second half contains the cross-section values organised as:

\[a[N_\ell N_{\sigma_0} + 1 : N_\ell N_{\sigma_0} + N_{\sigma_0}] = \sigma_{x,g}(\sigma_{0,1}), \ldots, \sigma_{x,g}(\sigma_{0,N_{\sigma_0}})\]

This is the Legendre-0 component for each sigma-zero. Higher Legendre components follow in the same block.

MF=6 Record Structure

Transfer matrices in MF=6 are stored per source group in a sparse representation. For each source group \(g\):

Record (group header):
  [TEMP(?), 0, NG2, IG2LO, NW, IG, MAT, MF, MT, line]
Record(s) (data):
  NW words packed 6 per line

where:

  • NG2 — number of secondary (target) groups with non-zero values

  • IG2LO — 1-based index of the lowest non-zero target group

  • NW — total words to read (includes flux weights)

  • IG — 1-based source group index

The data layout per source group is:

  1. Flux weights: \(N_\ell \times N_{\sigma_0}\) values (skipped)

  2. Transfer values: for each target group from IG2LO to IG2LO + NG2 - 2, and for each sigma-zero and Legendre order:

    for i_to = IG2LO to IG2LO + NG2 - 2:
        for i_sig0 = 1 to N_sig0:
            for i_lgn = 1 to N_lgn:
                sigma_s(IG → i_to, Legendre=i_lgn, sig0=i_sig0)
    

Scattering Matrix Assembly

The scattering matrix \(\Sigma_{\mathrm{s},\ell}^{(\sigma_0)}\) is assembled from three separate GENDF sections. This is one of the most delicate parts of the pipeline.

Thermal-Group Boundary

The energy group structure uses a thermal cutoff at group index 95 (corresponding to \(E \approx 4\) eV). Below this energy, the free-atom elastic scattering model breaks down because the target atoms are bound in a lattice or molecule (thermal motion affects scattering).

The GENDF file provides two models:

  • MT=2 — free-atom elastic scattering (valid above ~4 eV)

  • MT=221 — free-gas thermal scattering (all isotopes except H-1)

  • MT=222\(S(\alpha,\beta)\) thermal scattering for H bound in water (H-1 only)

Assembly Algorithm

The scattering matrix is built in four stages:

  1. Elastic (MT=2): Extract the elastic scattering transfer matrix. Zero out all entries where the source group \(g \le 95\) (the thermal range), because thermal scattering replaces elastic in that range. Add \(10^{-30}\) to all values (matching the MATLAB convention to avoid exact zeros in sparse matrices).

    vals[thermal_mask] = 0.0
    vals += 1e-30
    sigS[lgn][sig0] = sparse(ifrom-1, ito-1, vals, NG, NG)
    
  2. Inelastic (MT=51–91): For each discrete inelastic level that exists, extract the transfer matrix and add it to sigS. Inelastic scattering is sigma-zero independent (same values for all sigma-zero variants), so the first sigma-zero’s data is used for all.

  3. Thermal (MT=221 or MT=222): Extract the thermal scattering kernel and add it to sigS. This replaces the zeroed elastic entries in the thermal range. Like inelastic, thermal scattering is sigma-zero independent.

  4. The final scattering matrix structure is a list of lists: sigS[legendre_order][sig0_index], each a csr_matrix(NG, NG). Three Legendre orders (P0, P1, P2) are always stored.

Important

The elastic scattering data for groups \(g > 95\) (epithermal and fast) does depend on sigma-zero. Each sigma-zero variant has different elastic values at these groups. For groups \(g \le 95\), the elastic is zeroed and replaced by the sigma-zero-independent thermal kernel.

Reactions Not Included

The GENDF files for heavy isotopes (U-235, U-238) contain MF=6 entries for \((n,3n)\) (MT=17) and \((n,4n)\) (MT=37). These are not extracted, matching the original MATLAB code which only processes MT=51–91. The impact is negligible for thermal reactor applications (threshold energies are 6–15 MeV).

Total Cross Section

The total cross section \(\Sigma_{\mathrm{t},g}(\sigma_0)\) is computed from the components rather than read from MF=3 MT=1:

(1)\[\sigma_{\mathrm{t},g}(\sigma_0) = \sigma_{\mathrm{c},g}(\sigma_0) + \sigma_{\mathrm{f},g}(\sigma_0) + \sigma_{\alpha,g}(\sigma_0) + \sum_{g'} \sigma_{\mathrm{s},0,g \to g'}(\sigma_0) + \sum_{g'} \sigma_{\mathrm{2n},g \to g'}\]

This approach is used because:

  1. MF=3 MT=1 does not include upscattering (stated in the MATLAB source: “note that mf=3 mt=1 does not include upscatters”).

  2. Computing from components ensures self-consistency between the total and the reaction rates used by the solver.

sigT Consistency Issue (Historical)

Warning

The legacy MATLAB .m data files contain a systematic discrepancy in the stored sigT values. This section documents the issue for future reference.

The MATLAB convertCSVtoM.m script computes sigT from full-precision intermediate variables and writes it with %13.6e format (6 decimal places in scientific notation). It independently truncates all component cross sections (sigC, sigF, sigS) to the same format.

When the .m file is loaded and sigT is recomputed from the stored (truncated) components, the result differs from the stored sigT by a constant offset of 10–30 barns for heavy isotopes:

Isotope

.m sigT[0,0]

Recomputed

Offset

U-235 (294K)

15,523.0

15,504.2

18.8

U-238 (600K)

108.14

77.87

30.27

Zr-90 (600K)

(offset)

(recomputed)

8.3

H-1 (294K)

(matches)

(matches)

~0

The offset is constant across all energy groups and all sigma-zero rows for a given isotope/temperature. Isotopes with \(N_{\sigma_0} = 1\) (like H-1) show no discrepancy.

Root cause: The full-precision sigS row sums differ from the truncated-then-resummed version. Although each individual truncation error is \(O(10^{-7})\) relative, the scattering matrices have thousands of non-zero entries per row at resonance energies, and the accumulation of truncation errors is significant.

Impact on sigma-zero iterations: The sigma-zero solver interpolates sigT(\sigma_0) from the tabulated values. Using the GENDF-computed sigT instead of the .m file’s stored sigT shifts the converged sigma-zeros, which propagates to different interpolated cross sections and ultimately a ~0.4% shift in the PWR-like mixture’s \(\kinf\) (1.01771 vs 1.01357).

HDF5 Storage Format

Each element is stored in a single HDF5 file (e.g., U_235.h5) with one group per temperature:

/{temp_K}K/
    @aw          : scalar (atomic weight in amu)
    @temp        : scalar (temperature in K)
    eg           : (NG+1,)    — energy group boundaries (eV)
    sig0         : (N_sig0,)  — sigma-zero base points (barns)
    sigC         : (N_sig0, NG) — radiative capture
    sigL         : (N_sig0, NG) — (n,alpha)
    sigF         : (N_sig0, NG) — fission
    sigT         : (N_sig0, NG) — total
    nubar        : (NG,) — average neutrons per fission
    chi          : (NG,) — fission spectrum (normalised to 1)
    sig2/
        row      : (nnz,) int32  — COO row indices
        col      : (nnz,) int32  — COO column indices
        data     : (nnz,) float64 — COO values
    sigS/
        L{j}_S{k}/          — Legendre order j, sigma-zero k
            row  : (nnz,)
            col  : (nnz,)
            data : (nnz,)

Dense arrays use gzip compression (level 4). Sparse matrices are stored as COO triplets to avoid scipy-specific formats.

File Sizes

Element

Temperatures

HDF5 Size (MB)

H-1

8

12.3

U-235

6

50.0

U-238

6

37.8

O-16

6

10.8

Zr isotopes (×5)

4 each

~11 each

Data Loading API

The load_isotope function provides a uniform API:

from data.micro_xs import load_isotope

iso = load_isotope("U_235", 600)
# iso.sigC — shape (10, 421), capture XS for 10 sigma-zeros
# iso.sigS[0][0] — csr_matrix(421, 421), P0 scattering at sig0=0
# iso.eg — shape (422,), energy group boundaries in eV

The loader reads from the HDF5 files in data/micro_xs/{name}.h5.

Conversion Script

To regenerate the HDF5 files from the GENDF sources:

cd data/micro_xs
python convert_gxs_to_hdf5.py

This processes all 12 .GXS files and writes the corresponding .h5 files. Runtime is approximately 2–3 minutes on a modern laptop.

Validation

The HDF5 data pipeline was validated by running both homogeneous reactor cases and comparing against the MATLAB reference:

Case

Python \(\kinf\)

MATLAB \(\kinf\)

Match

Aqueous (H₂O + U-235, 294K)

1.03596

1.03596

Yes

PWR-like (UO₂ + Zry + H₂O+B, 600K)

1.01357

1.01357

Yes

Per-component validation for H-1 at 294K (1 sigma-zero, simplest case):

Quantity

Max diff (GXS vs .m)

Status

aw

0

Exact

eg

0

Exact

sigC

0

Exact

sigF

0

Exact

nubar

0

Exact

chi

0

Exact

sigS[0][0] (row sums)

0

Exact

sigS[0][0] (nnz)

77,627 = 77,627

Exact

Per-component validation for U-235 at 294K (10 sigma-zeros):

Quantity

Max diff (GXS vs .m)

Status

sigC

0

Exact

sigF

0

Exact

nubar

0

Exact

sigS[0][0] (row sums)

\(9.6 \times 10^{-7}\)

Negligible

sig2 (nnz)

6,067 = 6,067

Exact

See also

References

[ENDF102]

M.A. Kellett, O. Bersillon, R.W. Mills, “The JEFF-3.1/-3.1.1 Radioactive Decay Data and Fission Yields Sub-libraries”, OECD/NEA, 2009. ENDF-6 format manual: BNL-NCS-44945 (Rev. 2012).