A Review of Meta-Learning-Based Automatic Hyperparameter Optimization Algorithms for Neural Networks

Main Article Content

Zihan Zhao

Keywords

meta-learning, hyperparameter optimization, neural network, deep learning

Abstract

The setting of hyperparameters directly determines the performance of neural networks. Neural network hyperparameters are characterized by a large number, significant mutual influence, and the need for a complete model training process during adjustment, resulting in relatively high parameter-tuning costs. Although traditional methods can achieve hyperparameter tuning, each new task requires starting from scratch and cannot leverage historical experience. Meta-learning can learn “how to tune parameters” from historical tasks, providing a new approach for hyperparameter optimization. This paper systematically reviews the automatic hyperparameter optimization methods based on meta-learning. From the perspective of “how to utilize historical experience”, the existing methods are classified into three categories: configuration recommendation methods based on task similarity, search guidance methods based on meta-knowledge, and end-to-end optimization methods based on learning-based optimizers. The paper elaborates in detail on the core ideas, representative works, and advantages and disadvantages of each category of methods. The research findings indicate that the similarity-based recommendation methods (such as KGLasso) have fast inference speed and strong interpretability, and are suitable for small to medium-scale scenarios with relatively fixed task types; the meta-knowledge-based search methods (such as MetaLLMIX) strike a good balance between effectiveness and efficiency, and can usually obtain better configurations within 5 to 10 evaluations; the learning-based optimizer methods (such as VeLO) enable hyperparameter-free optimization and have obvious advantages in large-scale tasks, but the cost of training the optimizer itself is relatively high. Finally, this paper discusses the current key challenges, including issues such as meta-feature design, high meta-training costs, and task distribution shift, and looks forward to future development directions, such as adaptive meta-feature learning, large-scale meta-pre-training, and online meta-learning.

Abstract 0 | PDF Downloads 0

References

  • [1] Hospedales, T., Antoniou, A., Micaelli, P., & Storkey, A. (2022). Meta-learning in neural networks: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9), 5149–5169. https://doi.org/10.1109/TPAMI.2021.3079209
  • [2] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 770–778). IEEE.
  • [3] Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted Boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (pp. 807–814). Omnipress.
  • [4] Bengio, Y. (2012). Practical recommendations for gradient-based training of deep architectures. In G. Montavon, G. B. Orr, & K.-R. Müller (Eds.), Neural networks: Tricks of the trade (2nd ed., Vol. 7700, pp. 437–478). Springer. https://doi.org/10.1007/978-3-642-35289-8_26
  • [5] Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In International Conference on Learning Representations.
  • [6] Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1), 1929–1958.
  • [7] Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 2818–2826). IEEE.
  • [8] Smith, S. L., Kindermans, P.-J., Ying, C., & Le, Q. V. (2018). Don't decay the learning rate, increase the batch size. In International Conference on Learning Representations.
  • [9] Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13, 281–305.
  • [10] Snoek, J., Larochelle, H., & Adams, R. P. (2012). Practical Bayesian optimization of machine learning algorithms. In Advances in Neural Information Processing Systems (Vol. 25, pp. 2951–2959).
  • [11] Eggensperger, K., Feurer, M., Hutter, F., Bergstra, J., Snoek, J., Hoos, H., & Leyton-Brown, K. (2013). Towards an empirical foundation for assessing Bayesian optimization of hyperparameters. In NIPS Workshop on Bayesian Optimization in Theory and Practice.
  • [12] Real, E., Aggarwal, A., Huang, Y., & Le, Q. V. (2019). Regularized evolution for image classifier architecture search. Proceedings of the AAAI Conference on Artificial Intelligence, 33(1), 4780–4789. https://doi.org/10.1609/aaai.v33i01.33014780
  • [13] Finn, C., Abbeel, P., & Levine, S. (2017). Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Machine Learning (pp. 1126–1135). PMLR.
  • [14] Deng, L., & Xiao, M. (2024). Hyperparameter recommendation via automated meta-feature selection embedded with kernel group Lasso learning. Knowledge-Based Systems, 306, 112706. https://doi.org/10.1016/j.knosys.2024.112706[15] Tiouti, M., & Bal-Ghaoui, M. (2025). MetaLLMix: An XAI aided LLM-meta-learning based approach for hyperparameters optimization. arXiv. https://arxiv.org/abs/2509.09387
  • [16] Metz, L., Harrison, J., Freeman, C. D., & Sohl-Dickstein, J. (2022). VeLO: Training versatile learned optimizers by scaling up. arXiv. https://arxiv.org/abs/2211.09760
  • [17] Thérien, B., Joseph, C. É., Knyazev, B., & Gidel, G. (2024). μLO: Compute-efficient meta-generalization of learned optimizers. In Advances in Neural Information Processing Systems.
  • [18] Moudgil, A., Knyazev, B., Lajoie, G., & Belilovsky, E. (2025). Celo: Training versatile learned optimizers on a compute diet. arXiv. https://arxiv.org/abs/2501.12670