.. _theory-cross-section-data: ============================================== Cross-Section Data Pipeline ============================================== .. contents:: Contents :local: :depth: 3 Key Facts ========= **Read this before modifying the cross-section pipeline.** - 421-group microscopic XS from GENDF (GXS) files → HDF5 via ``data/micro_xs/`` - ``Isotope`` dataclass: sig_t, sig_c, sig_f, sig_el, sig_inel, nu, chi (421 groups) - Sigma-zero iteration: ``data/macro_xs/sigma_zeros.py`` — self-shielding - ``Mixture`` dataclass: macroscopic XS with ``SigS[l][g_from, g_to]`` convention - Consistency: :math:`\Sigma_t = \Sigma_c + \Sigma_f + \sum_{g'} \Sigma_{s,g \to g'}` - ``load_isotope()`` auto-selects HDF5 or fallback .m parser - Verification uses synthetic XS from ``derivations/_xs_library.py`` (regions A/B/C/D), NOT this pipeline Overview ======== Every solver in ORPHEUS relies on multi-group microscopic cross sections for the 12 nuclides in the 421-energy-group JEFF-3.1 library. This chapter documents the complete data pipeline from the authoritative IAEA source files to the internal ``Isotope`` dataclass: 1. **GENDF format** — the IAEA distributes processed nuclear data in the GENDF (Groupwise ENDF) format, which uses fixed-width 80-column records inherited from punched-card conventions. 2. **Parsing** — the ``gendf.py`` module reads GENDF files directly, bypassing the MATLAB CSV intermediary. 3. **Scattering assembly** — elastic, inelastic, and thermal scattering matrices are combined with careful treatment of thermal-group boundaries. 4. **HDF5 serialisation** — the parsed data is stored in compressed HDF5 files for fast loading at runtime. 5. **Loading** — ``load_isotope()`` provides a uniform API that auto-selects the HDF5 backend when available. The 12 nuclides in the library are: .. list-table:: :header-rows: 1 :widths: 15 15 15 20 * - Nuclide - GXS File - Temperatures (K) - Sigma-zeros * - H-1 - ``H_001.GXS`` - 294, 350, 400, 450, 500, 550, 600, 650 - 1 * - O-16 - ``O_016.GXS`` - 294, 600, 900, 1200, 1500, 1800 - 6 * - B-10 - ``B_010.GXS`` - 294, 600, 900, 1200 - 4 * - B-11 - ``B_011.GXS`` - 294, 600, 900, 1200 - 4 * - Na-23 - ``NA023.GXS`` - 294, 600, 900, 1200 - 4 * - U-235 - ``U_235.GXS`` - 294, 600, 900, 1200, 1500, 1800 - 10 * - U-238 - ``U_238.GXS`` - 294, 600, 900, 1200, 1500, 1800 - 10 * - Zr-90 - ``ZR090.GXS`` - 294, 600, 900, 1200 - 4 * - Zr-91 - ``ZR091.GXS`` - 294, 600, 900, 1200 - 4 * - Zr-92 - ``ZR092.GXS`` - 294, 600, 900, 1200 - 4 * - Zr-94 - ``ZR094.GXS`` - 294, 600, 900, 1200 - 4 * - Zr-96 - ``ZR096.GXS`` - 294, 600, 900, 1200 - 4 The GENDF Format ================= Source ------ The GENDF (Groupwise ENDF) files are obtained from the IAEA Nuclear Data Services: https://www-nds.iaea.org/ads/adsgendf.html These are the 421-group JEFF-3.1 processed nuclear data files. Each file contains all reaction cross sections and transfer matrices for one nuclide at multiple temperatures. Record Layout -------------- Every line in a GENDF file is exactly 80 characters wide, following the ENDF-6 format [ENDF102]_ inherited from the punched-card era: .. code-block:: text Columns 1-66: 6 data fields, 11 characters each Columns 67-70: MAT number (material identifier) Columns 71-72: MF number (file type) Columns 73-75: MT number (reaction type) Columns 76-80: line sequence number For example, the first data record of H-1 looks like: .. code-block:: text 1.001000+3 9.991673-1 0 1 -1 1 125 1451 1 |----11----|----11----|----11----|----11----|----11----|----11----| MAT|MF|MT |SEQ| Compact Float Notation ----------------------- Data fields use a compact Fortran notation where the ``E`` in scientific notation is omitted. The exponent sign immediately follows the mantissa: .. code-block:: text 1.001000+3 → 1.001000E+3 = 1001.0 9.991673-1 → 9.991673E-1 = 0.9991673 2.407191-7 → 2.407191E-7 = 2.407191×10⁻⁷ The parser ``_parse_gendf_field`` in ``gendf.py`` handles this by inserting ``E`` before any ``+`` or ``-`` sign that follows a digit: .. code-block:: python s = re.sub(r"(\d)([+-])", r"\1E\2", s) return float(s) MF and MT Numbers ------------------ The MF (file) and MT (reaction) numbers identify the type of data in each section: **MF=1 — General information:** .. list-table:: :header-rows: 1 :widths: 10 40 * - MT - Content * - 451 - Header: temperatures, sigma-zero base points, energy group boundaries (422 values for 421 groups) **MF=3 — Cross sections** (sigma-zero dependent, one value per group): .. list-table:: :header-rows: 1 :widths: 10 40 * - MT - Reaction * - 1 - Total cross section (does **not** include upscattering) * - 18 - Fission * - 102 - Radiative capture :math:`(n,\gamma)` * - 107 - :math:`(n,\alpha)` * - 452 - Total :math:`\bar{\nu}` (average neutrons per fission) **MF=6 — Transfer matrices** (group-to-group scattering): .. list-table:: :header-rows: 1 :widths: 10 40 * - MT - Reaction * - 2 - Elastic scattering (sigma-zero dependent) * - 16 - :math:`(n,2n)` reaction * - 18 - Fission spectrum :math:`\chi(g)` * - 51–91 - Discrete inelastic scattering levels * - 221 - Free-gas thermal scattering * - 222 - Thermal scattering for H bound in water (:math:`S(\alpha,\beta)`) MF=3 Record Structure ---------------------- Each MF=3 section begins with a header record followed by per-group data records. The structure for a section with :math:`N_\ell` Legendre components and :math:`N_{\sigma_0}` sigma-zero values is: .. code-block:: text Record 1 (section header): [ZA, AWR, NL, N_sig0, LRFLAG, NG, MAT, MF, MT, 1] For each group g = 1, ..., NG: Record (group header): [TEMP, 0, NL, N_sig0, NW, IG, MAT, MF, MT, line] Record(s) (data): NW = 2 × NL × N_sig0 words packed 6 per line The first half of the NW words contains flux weights; the second half contains the cross-section values organised as: .. math:: a[N_\ell N_{\sigma_0} + 1 : N_\ell N_{\sigma_0} + N_{\sigma_0}] = \sigma_{x,g}(\sigma_{0,1}), \ldots, \sigma_{x,g}(\sigma_{0,N_{\sigma_0}}) This is the Legendre-0 component for each sigma-zero. Higher Legendre components follow in the same block. MF=6 Record Structure ---------------------- Transfer matrices in MF=6 are stored per source group in a sparse representation. For each source group :math:`g`: .. code-block:: text Record (group header): [TEMP(?), 0, NG2, IG2LO, NW, IG, MAT, MF, MT, line] Record(s) (data): NW words packed 6 per line where: - ``NG2`` — number of secondary (target) groups with non-zero values - ``IG2LO`` — 1-based index of the lowest non-zero target group - ``NW`` — total words to read (includes flux weights) - ``IG`` — 1-based source group index The data layout per source group is: 1. **Flux weights**: :math:`N_\ell \times N_{\sigma_0}` values (skipped) 2. **Transfer values**: for each target group from ``IG2LO`` to ``IG2LO + NG2 - 2``, and for each sigma-zero and Legendre order: .. code-block:: text for i_to = IG2LO to IG2LO + NG2 - 2: for i_sig0 = 1 to N_sig0: for i_lgn = 1 to N_lgn: sigma_s(IG → i_to, Legendre=i_lgn, sig0=i_sig0) Scattering Matrix Assembly =========================== The scattering matrix :math:`\Sigma_{\mathrm{s},\ell}^{(\sigma_0)}` is assembled from three separate GENDF sections. This is one of the most delicate parts of the pipeline. Thermal-Group Boundary ----------------------- The energy group structure uses a thermal cutoff at group index 95 (corresponding to :math:`E \approx 4` eV). Below this energy, the free-atom elastic scattering model breaks down because the target atoms are bound in a lattice or molecule (thermal motion affects scattering). The GENDF file provides two models: - **MT=2** — free-atom elastic scattering (valid above ~4 eV) - **MT=221** — free-gas thermal scattering (all isotopes except H-1) - **MT=222** — :math:`S(\alpha,\beta)` thermal scattering for H bound in water (H-1 only) Assembly Algorithm ------------------- The scattering matrix is built in four stages: 1. **Elastic (MT=2)**: Extract the elastic scattering transfer matrix. **Zero out** all entries where the source group :math:`g \le 95` (the thermal range), because thermal scattering replaces elastic in that range. Add :math:`10^{-30}` to all values (matching the MATLAB convention to avoid exact zeros in sparse matrices). .. code-block:: python vals[thermal_mask] = 0.0 vals += 1e-30 sigS[lgn][sig0] = sparse(ifrom-1, ito-1, vals, NG, NG) 2. **Inelastic (MT=51–91)**: For each discrete inelastic level that exists, extract the transfer matrix and **add** it to sigS. Inelastic scattering is sigma-zero independent (same values for all sigma-zero variants), so the first sigma-zero's data is used for all. 3. **Thermal (MT=221 or MT=222)**: Extract the thermal scattering kernel and **add** it to sigS. This replaces the zeroed elastic entries in the thermal range. Like inelastic, thermal scattering is sigma-zero independent. 4. The final scattering matrix structure is a list of lists: ``sigS[legendre_order][sig0_index]``, each a ``csr_matrix(NG, NG)``. Three Legendre orders (P0, P1, P2) are always stored. .. important:: The elastic scattering data for groups :math:`g > 95` (epithermal and fast) **does** depend on sigma-zero. Each sigma-zero variant has different elastic values at these groups. For groups :math:`g \le 95`, the elastic is zeroed and replaced by the sigma-zero-independent thermal kernel. Reactions Not Included ----------------------- The GENDF files for heavy isotopes (U-235, U-238) contain MF=6 entries for :math:`(n,3n)` (MT=17) and :math:`(n,4n)` (MT=37). These are **not** extracted, matching the original MATLAB code which only processes MT=51–91. The impact is negligible for thermal reactor applications (threshold energies are 6–15 MeV). Total Cross Section ==================== The total cross section :math:`\Sigma_{\mathrm{t},g}(\sigma_0)` is **computed** from the components rather than read from MF=3 MT=1: .. math:: :label: sigT-computed \sigma_{\mathrm{t},g}(\sigma_0) = \sigma_{\mathrm{c},g}(\sigma_0) + \sigma_{\mathrm{f},g}(\sigma_0) + \sigma_{\alpha,g}(\sigma_0) + \sum_{g'} \sigma_{\mathrm{s},0,g \to g'}(\sigma_0) + \sum_{g'} \sigma_{\mathrm{2n},g \to g'} This approach is used because: 1. MF=3 MT=1 does **not** include upscattering (stated in the MATLAB source: *"note that mf=3 mt=1 does not include upscatters"*). 2. Computing from components ensures self-consistency between the total and the reaction rates used by the solver. .. _sigT-consistency: sigT Consistency Issue (Historical) ------------------------------------- .. warning:: The legacy MATLAB ``.m`` data files contain a systematic discrepancy in the stored ``sigT`` values. This section documents the issue for future reference. The MATLAB ``convertCSVtoM.m`` script computes ``sigT`` from full-precision intermediate variables and writes it with ``%13.6e`` format (6 decimal places in scientific notation). It independently truncates all component cross sections (sigC, sigF, sigS) to the same format. When the ``.m`` file is loaded and ``sigT`` is **recomputed** from the stored (truncated) components, the result differs from the stored ``sigT`` by a **constant offset** of 10–30 barns for heavy isotopes: .. list-table:: :header-rows: 1 :widths: 15 20 20 15 * - Isotope - .m sigT[0,0] - Recomputed - Offset * - U-235 (294K) - 15,523.0 - 15,504.2 - 18.8 * - U-238 (600K) - 108.14 - 77.87 - 30.27 * - Zr-90 (600K) - (offset) - (recomputed) - 8.3 * - H-1 (294K) - (matches) - (matches) - ~0 The offset is **constant across all energy groups and all sigma-zero rows** for a given isotope/temperature. Isotopes with :math:`N_{\sigma_0} = 1` (like H-1) show no discrepancy. **Root cause**: The full-precision ``sigS`` row sums differ from the truncated-then-resummed version. Although each individual truncation error is :math:`O(10^{-7})` relative, the scattering matrices have thousands of non-zero entries per row at resonance energies, and the accumulation of truncation errors is significant. **Impact on sigma-zero iterations**: The sigma-zero solver interpolates ``sigT(\sigma_0)`` from the tabulated values. Using the GENDF-computed ``sigT`` instead of the ``.m`` file's stored ``sigT`` shifts the converged sigma-zeros, which propagates to different interpolated cross sections and ultimately a ~0.4% shift in the PWR-like mixture's :math:`\kinf` (1.01771 vs 1.01357). HDF5 Storage Format ===================== Each element is stored in a single HDF5 file (e.g., ``U_235.h5``) with one group per temperature: .. code-block:: text /{temp_K}K/ @aw : scalar (atomic weight in amu) @temp : scalar (temperature in K) eg : (NG+1,) — energy group boundaries (eV) sig0 : (N_sig0,) — sigma-zero base points (barns) sigC : (N_sig0, NG) — radiative capture sigL : (N_sig0, NG) — (n,alpha) sigF : (N_sig0, NG) — fission sigT : (N_sig0, NG) — total nubar : (NG,) — average neutrons per fission chi : (NG,) — fission spectrum (normalised to 1) sig2/ row : (nnz,) int32 — COO row indices col : (nnz,) int32 — COO column indices data : (nnz,) float64 — COO values sigS/ L{j}_S{k}/ — Legendre order j, sigma-zero k row : (nnz,) col : (nnz,) data : (nnz,) Dense arrays use gzip compression (level 4). Sparse matrices are stored as COO triplets to avoid scipy-specific formats. File Sizes ----------- .. list-table:: :header-rows: 1 :widths: 15 15 15 * - Element - Temperatures - HDF5 Size (MB) * - H-1 - 8 - 12.3 * - U-235 - 6 - 50.0 * - U-238 - 6 - 37.8 * - O-16 - 6 - 10.8 * - Zr isotopes (×5) - 4 each - ~11 each Data Loading API ================= The ``load_isotope`` function provides a uniform API: .. code-block:: python from data.micro_xs import load_isotope iso = load_isotope("U_235", 600) # iso.sigC — shape (10, 421), capture XS for 10 sigma-zeros # iso.sigS[0][0] — csr_matrix(421, 421), P0 scattering at sig0=0 # iso.eg — shape (422,), energy group boundaries in eV The loader reads from the HDF5 files in ``data/micro_xs/{name}.h5``. Conversion Script ------------------ To regenerate the HDF5 files from the GENDF sources: .. code-block:: bash cd data/micro_xs python convert_gxs_to_hdf5.py This processes all 12 ``.GXS`` files and writes the corresponding ``.h5`` files. Runtime is approximately 2–3 minutes on a modern laptop. Validation =========== The HDF5 data pipeline was validated by running both homogeneous reactor cases and comparing against the MATLAB reference: .. list-table:: :header-rows: 1 :widths: 30 20 20 10 * - Case - Python :math:`\kinf` - MATLAB :math:`\kinf` - Match * - Aqueous (H₂O + U-235, 294K) - 1.03596 - 1.03596 - Yes * - PWR-like (UO₂ + Zry + H₂O+B, 600K) - 1.01357 - 1.01357 - Yes Per-component validation for H-1 at 294K (1 sigma-zero, simplest case): .. list-table:: :header-rows: 1 :widths: 20 20 15 * - Quantity - Max diff (GXS vs .m) - Status * - ``aw`` - 0 - Exact * - ``eg`` - 0 - Exact * - ``sigC`` - 0 - Exact * - ``sigF`` - 0 - Exact * - ``nubar`` - 0 - Exact * - ``chi`` - 0 - Exact * - ``sigS[0][0]`` (row sums) - 0 - Exact * - ``sigS[0][0]`` (nnz) - 77,627 = 77,627 - Exact Per-component validation for U-235 at 294K (10 sigma-zeros): .. list-table:: :header-rows: 1 :widths: 20 20 15 * - Quantity - Max diff (GXS vs .m) - Status * - ``sigC`` - 0 - Exact * - ``sigF`` - 0 - Exact * - ``nubar`` - 0 - Exact * - ``sigS[0][0]`` (row sums) - :math:`9.6 \times 10^{-7}` - Negligible * - ``sig2`` (nnz) - 6,067 = 6,067 - Exact .. seealso:: - :ref:`theory-homogeneous` — first consumer of the XS pipeline; demonstrates the full path from ``load_isotope()`` to :math:`k_\infty`. - :ref:`theory-verification` — verification uses :ref:`synthetic cross sections ` (regions A/B/C/D), not this pipeline. - :ref:`theory-collision-probability`, :ref:`theory-discrete-ordinates`, :ref:`theory-method-of-characteristics`, :ref:`theory-monte-carlo` — all transport solvers consume ``Mixture`` objects from this pipeline. References =========== .. [ENDF102] M.A. Kellett, O. Bersillon, R.W. Mills, "The JEFF-3.1/-3.1.1 Radioactive Decay Data and Fission Yields Sub-libraries", OECD/NEA, 2009. ENDF-6 format manual: BNL-NCS-44945 (Rev. 2012).