You are required to read and agree to the below before accessing a full-text version of an article in the IDE article repository.

The full-text document you are about to access is subject to national and international copyright laws. In most cases (but not necessarily all) the consequence is that personal use is allowed given that the copyright owner is duly acknowledged and respected. All other use (typically) require an explicit permission (often in writing) by the copyright owner.

For the reports in this repository we specifically note that

  • the use of articles under IEEE copyright is governed by the IEEE copyright policy (available at http://www.ieee.org/web/publications/rights/copyrightpolicy.html)
  • the use of articles under ACM copyright is governed by the ACM copyright policy (available at http://www.acm.org/pubs/copyright_policy/)
  • technical reports and other articles issued by M‰lardalen University is free for personal use. For other use, the explicit consent of the authors is required
  • in other cases, please contact the copyright owner for detailed information

By accepting I agree to acknowledge and respect the rights of the copyright owner of the document I am about to access.

If you are in doubt, feel free to contact webmaster@ide.mdh.se

Mechanistic Interpretability of ReLU Neural Networks Through Piecewise-Affine Mapping

Fulltext:


Publication Type:

Journal article

Venue:

Machine Learning

DOI:

https://doi.org/10.1007/s10994-025-06957-0


Abstract

Rectified linear unit (ReLU) based neural networks (NNs) are recognised for their re- markable accuracy. However, the decision-making processes of these networks are often complex and difficult to understand. This complexity can lead to challenges in error iden- tification, establishing trust, and conducting thorough analyses. Existing methods often fail to provide clear insights into the actual computations occurring within each layer of these networks. To address this challenge, this study introduces a mechanistic inter- pretability method called ReLU Region Reasoning (Re3). This method uses the known piecewise-linear characteristics of ReLU networks to offer insights into neuron activation and accurately assess how each feature contributes to the final output and probability. Re3 effectively determines neuron activations and evaluates the contribution of each feature within a specified linear region. Experiments conducted on multiple benchmark datas- ets, including both tabular and image data, demonstrate that Re3 can replicate individual predictions without error, align feature importance with domain expertise, and maintain consistency with current explanatory methods, thereby avoiding the typical randomness. Analysing neurons reveals activation sparsity and identifies dominant units, thus providing clear targets for model simplification and troubleshooting. By ensuring transparency and algebraic accessibility in each stage of a ReLU-based NN’s decision process, Re3 can be a valuable practical tool for achieving precise mechanistic interpretability.

Bibtex

@article{Barua7328,
author = {Arnab Barua and Mobyen Uddin Ahmed and Shahina Begum},
title = {Mechanistic Interpretability of ReLU Neural Networks Through Piecewise-Affine Mapping},
pages = {1--35},
month = {January},
year = {2026},
journal = {Machine Learning},
url = {http://www.es.mdu.se/publications/7328-}
}