LLM-Based Zero-Code Control for Robotic Arms: Representative Methods, Challenges, and Future Trends

Main Article Content

Jiajun He

Keywords

Large Language Models (LLMs), robotic arms, zero-code control, natural language interaction, embodied AI

Abstract

With the rapid evolution of Large Language Models (LLMs), significant advancements have emerged in natural language processing, cross-modal reasoning, and complex task planning, heralding a paradigm shift in robotic manipulation. Traditional robotic arm control demands specialized programming and extensive engineering expertise, creating a high barrier to entry. This paper systematically reviews recent advancements in LLM-based zero-code implementation technologies for robotic arms, which enable direct control through natural language and substantially reduce development costs and operational complexity. We establish a clear technical classification framework, analyzing the performance characteristics and applicability of various methods such as direct mapping approaches (SayCan, Code as Policies, ProgPrompt) and feedback-driven architectures (Inner Monologue). We also explore multimodal fusion techniques (RT-2, PaLM-E, VIMA, VoxPoser) and their application across diverse scenarios including education, home services, and manufacturing. Furthermore, we identify key challenges—including natural language parsing accuracy, robustness of multimodal fusion, long-horizon planning, and safety—and propose future research directions aimed at fostering the integration of embodied intelligence with natural language interaction. This study clarifies the research trajectory in this field, offering theoretical support for the broader popularization and deployment of robotic technology.

Abstract 0 | PDF Downloads 0

References

  • [1] Ichter, B., Brohan, A., Chebotar, Y., Finn, C., Hausman, K., Herzog, A., Ho, D., Ibarz, J., Irpan, A., Jang, E., Julian, R., Kalashnikov, D., Levine, S., Lu, Y., Parada, C., Rao, K., Sermanet, P., Toshev, A. T., Vanhoucke, V., ... Fu, C. K. (2023). Do as I can, not as I say: Grounding language in robotic affordances. In K. Liu, D. Kulic, & J. Ichnowski (Eds.), Proceedings of the 6th Conference on Robot Learning (Vol. 205, pp. 287–318). PMLR.
  • [2] Liang, J., Huang, W., Xia, F., Xu, P., Hausman, K., Ichter, B., Florence, P., & Zeng, A. (2023). Code as policies: Language model programs for embodied control. In 2023 IEEE International Conference on Robotics and Automation (ICRA) (pp. 9493–9500). IEEE. https://doi.org/10.1109/ICRA48891.2023.10160591
  • [3] Singh, I., Blukis, V., Mousavian, A., Goyal, A., Xu, D., Tremblay, J., Fox, D., Thomason, J., & Garg, A. (2023). ProgPrompt: Generating situated robot task plans using large language models. In 2023 IEEE International Conference on Robotics and Automation (ICRA) (pp. 11523–11530). IEEE. https://doi.org/10.1109/ICRA48891.2023.10161317
  • [4] Huang, W., Xia, F., Xiao, T., Chan, H., Liang, J., Florence, P., Zeng, A., Tompson, J., Mordatch, I., Chebotar, Y., Sermanet, P., Jackson, T., Brown, N., Luu, L., Levine, S., Hausman, K., & Ichter, B. (2023). Inner Monologue: Embodied reasoning through planning with language models. In K. Liu, D. Kulic, & J. Ichnowski (Eds.), Proceedings of the 6th Conference on Robot Learning (Vol. 205, pp. 1769–1782). PMLR.
  • [5] Brohan, A., Brown, N., Carbajal, J., Chebotar, Y., Dabis, J., Finn, C., Gopalakrishnan, K., Hausman, K., Herzog, A., Hsu, J., Ibarz, J., Ichter, B., Irpan, A., Jackson, T., Jesmonth, S., Joshi, N., Julian, R., Kalashnikov, D., Kuang, Y., ... Zitkovich, B. (2023). RT-1: Robotics Transformer for real-world control at scale. In Proceedings of Robotics: Science and Systems. Robotics: Science and Systems Foundation. https://doi.org/10.15607/RSS.2023.XIX.025
  • [6] Zitkovich, B., Yu, T., Xu, S., Xu, P., Xiao, T., Xia, F., Wu, J., Wohlhart, P., Welker, S., Wahid, A., Vuong, Q., Vanhoucke, V., Tran, H., Soricut, R., Singh, A., Singh, J., Sermanet, P., Sanketi, P. R., Salazar, G., ... Han, K. (2023). RT-2: Vision-language-action models transfer web knowledge to robotic control. In J. Tan, M. Toussaint, & K. Darvish (Eds.), Proceedings of the 7th Conference on Robot Learning (Vol. 229, pp. 2165–2183). PMLR.
  • [7] Driess, D., Xia, F., Sajjadi, M. S. M., Lynch, C., Chowdhery, A., Ichter, B., Wahid, A., Tompson, J., Vuong, Q., Yu, T., Huang, W., Chebotar, Y., Sermanet, P., Duckworth, D., Levine, S., Vanhoucke, V., Hausman, K., Toussaint, M., Greff, K., ... Florence, P. (2023). PaLM-E: An embodied multimodal language model. In A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato, & J. Scarlett (Eds.), Proceedings of the 40th International Conference on Machine Learning (Vol. 202, pp. 8469–8488). PMLR.
  • [8] Jiang, Y., Gupta, A., Zhang, Z., Wang, G., Dou, Y., Chen, Y., Fei-Fei, L., Anandkumar, A., Zhu, Y., & Fan, L. (2023). VIMA: Robot manipulation with multimodal prompts. In A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato, & J. Scarlett (Eds.), Proceedings of the 40th International Conference on Machine Learning (Vol. 202, pp. 14975–15022). PMLR.
  • [9] Huang, W., Wang, C., Zhang, R., Li, Y., Wu, J., & Fei-Fei, L. (2023). VoxPoser: Composable 3D value maps for robotic manipulation with language models. In J. Tan, M. Toussaint, & K. Darvish (Eds.), Proceedings of the 7th Conference on Robot Learning (Vol. 229, pp. 540–562). PMLR.
  • [10] Shridhar, M., Manuelli, L., & Fox, D. (2022). CLIPort: What and where pathways for robotic manipulation. In A. Faust, D. Hsu, & G. Neumann (Eds.), Proceedings of the 5th Conference on Robot Learning (Vol. 164, pp. 894–906). PMLR.
  • [11] Open X-Embodiment Collaboration. (2024). Open X-Embodiment: Robotic learning datasets and RT-X models. In 2024 IEEE International Conference on Robotics and Automation (ICRA) (pp. 6892–6903). IEEE. https://doi.org/10.1109/ICRA57147.2024.10611477