J. Mach. Learn. , 4 (2025), pp. 192-222.
Published online: 2025-09
[An open-access article; the PDF is free to any online user.]
Cited by
- BibTex
- RIS
- TXT
We propose an optimistic estimate framework to evaluate the potential of nonlinear models in fitting target functions at overparameterization. In our framework, such potential is quantified by estimating the smallest possible sample size needed for a model to recover a target function, referred to as an optimistic sample size. Following the framework, we derive the optimistic sample sizes for matrix factorization models, deep models, and two-layer neural networks (NNs) with fully-connected or convolutional architectures. For each nonlinear model, we confirm via experiments that the target functions can be fitted at overparameterization as predicted by our analysis. Our results suggest a hierarchical inductive bias of nonlinear models towards simple functions with smaller optimistic sample sizes intrinsic to their architecture. The dynamical realization of the suggested inductive bias remains an open problem for the further study.
}, issn = {2790-2048}, doi = {https://doi.org/10.4208/jml.231109}, url = {http://global-sci.org/intro/article_detail/jml/24380.html} }We propose an optimistic estimate framework to evaluate the potential of nonlinear models in fitting target functions at overparameterization. In our framework, such potential is quantified by estimating the smallest possible sample size needed for a model to recover a target function, referred to as an optimistic sample size. Following the framework, we derive the optimistic sample sizes for matrix factorization models, deep models, and two-layer neural networks (NNs) with fully-connected or convolutional architectures. For each nonlinear model, we confirm via experiments that the target functions can be fitted at overparameterization as predicted by our analysis. Our results suggest a hierarchical inductive bias of nonlinear models towards simple functions with smaller optimistic sample sizes intrinsic to their architecture. The dynamical realization of the suggested inductive bias remains an open problem for the further study.