Tiny Large Language Models in Embedded Nvidia Portable Hardware Comparative Analysis

Marco Antonio Jinete Gómez, Robinson Jiménez-Moreno, Anny Astrid Espitia-Cubillos

Abstract

This study leverages the NVIDIA platform, a hardware with constrained computational resources, to evaluate the real-time performance of various small language models as local assistants, functioning without Internet access. The research employed a novel four-phase methodology, beginning with the selection of small language models for evaluation, followed by the design of a reproducible test protocol for future studies. This protocol incorporates both quantitative and qualitative assessment criteria, including latency, power consumption, memory usage, accuracy, creativity, narrative coherence, structural adherence, and Spanish performance. In the third phase, the selected models were locally embedded and executed on the hardware to compare their respective performances. Finally, the suitability of each model for real-time applications was analyzed, leading to the development of a comprehensive test protocol. The findings indicate that, of the nine models evaluated, Qwen2.5 with 3 billion parameters emerges as the optimal choice when resources allow, while Qwen2.5 with 0.5 billion parameters provides a viable alternative for scenarios with severe resource limitations.

 

Keywords: large language models, tiny LLM, Nvidia jetson Xavier hardware, Qwen2.5 model, Phi3.5 model, Smollm2 model, Mistral model.

 

https://doi.org/10.55463/issn.1674-2974.52.4.2


Full Text:

PDF


References


LIU Y., HE H., HAN T., ZHANG X., LIU M., TIAN J., ZHANG Y., WANG J., GAO X., ZHONG T., PAN Y., XU S., WU Z., LIU Z., ZHANG X., ZHANG S., HU X., ZHANG T., QIANG N., LIU T., and GE B. Understanding LLMs: A comprehensive overview from training to inference. Neurocomputing, 2025, 620: 129190, https://doi.org/10.1016/j.neucom.2024.129190

YAO Y., DUAN J., XU K., CAI Y., SUN Z., and ZHANG Y. A survey on large language model (LLM) security and privacy: The Good, The Bad, and The Ugly. High-Confidence Computing, 2024, 4(2): 100211, https://doi.org/10.1016/j.hcc.2024.100211

B. YAN, K. LI, M. XU, Y. DONG, Y. ZHANG, Z. REN, and X. CHENG. On protecting the data privacy of Large Language Models (LLMs) and LLM agents: A literature review. High-Confidence Computing, 2025, 100300, https://doi.org/10.1016/j.hcc.2025.100300

P. Y. P. CHAN, J. KEUNG, and Z. YANG. Effectiveness of symmetric metamorphic relations on validating the stability of code generation LLM. Journal of Systems and Software, 2025, 222: 112330, https://doi.org/10.1016/j.jss.2024.112330

Z. W. PETZEL, and L. SOWERBY, Prejudiced interactions with large language models (LLMs) reduce trustworthiness and behavioral intentions among members of stigmatized groups, Computers in Human Behavior, 2025, 165: 108563, https://doi.org/10.1016/j.chb.2025.108563

RAPP A., DI LODOVICO C., and DI CARO L. How do people react to ChatGPT's unpredictable behavior? Anthropomorphism, uncanniness, and fear of AI: A qualitative study on individuals’ perceptions and understandings of LLMs’ nonsensical hallucinations. International Journal of Human-Computer Studies, 2025, 198: 103471, https://doi.org/10.1016/j.ijhcs.2025.103471

M. A. FERRAG, F. ALWAHEDI, A. BATTAH, B. CHERIF, A. MECHRI, N. TIHANYI, T. BISZTRAY, and M. DEBBAH. Generative AI in Cybersecurity: A Comprehensive Review of LLM Applications and Vulnerabilities. Internet of Things and Cyber-Physical Systems, 2025, https://doi.org/10.1016/j.iotcps.2025.01.001

ZHANG J., CLAIRMONT C., QUE X., LI W., CHEN W., LI C., and MA X. Streamlining geoscience data analysis with an LLM-driven workflow. Applied Computing and Geosciences, 2025, 25: 100218, https://doi.org/10.1016/j.acags.2024.100218

RHOMRASI L., AHSINI Y., IGUALDE-SÁEZ A., VINUESA R., HOYAS S., GARCÍA-SABATER J. P., FULLANA-I-ALFONSO M. J., and CONEJERO J. A. LLM performance on mathematical reasoning in Catalan language. Results in Engineering, 2025, 25: 104366, https://doi.org/10.1016/j.rineng.2025.104366

SUN J., and LV Z., Zero-shot detection of LLM-generated text via text reorder. Neurocomputing, 2025, 631: 129829, https://doi.org/10.1016/j.neucom.2025.129829

LIN L., ZHANG S., FU S., and LIU Y. FD-LLM: Large language model for fault diagnosis of complex equipment. Advanced Engineering Informatics, 2025, 65(Part A): 103208, https://doi.org/10.1016/j.aei.2025.103208

JIN H., WANG J., SHENG J., WU Y., CHEN J., WANG Y., and LIU J. WiseEDA: LLMs in RF Circuit Design. Microelectronics Journal, 2025, 158: 106607, https://doi.org/10.1016/j.mejo.2025.106607

LIÉVIN V., HOTHER C. E., MOTZFELDT A. G., and WINTHER O., Can large language models reason about medical questions? Patterns, 2024, 5(3): 100943, https://doi.org/10.1016/j.patter.2024.100943

XU Z., SONG T., and LEE Y. Confronting verbalized uncertainty: Understanding how LLM’s verbalized uncertainty influences users in AI-assisted decision-making. International Journal of Human-Computer Studies, 2025, 197: 103455, https://doi.org/10.1016/j.ijhcs.2025.103455

SOLIMAN G., ZAKI H., and KILANY M. A comparative analysis of encoder only and decoder only models for challenging LLM-generated STEM MCQs using a self-evaluation approach. Natural Language Processing Journal, 2025, 10: 100131, https://doi.org/10.1016/j.nlp.2025.100131

CHITTY-VENKATA K. T., MITTAL S., EMANI M., VISHWANATH V., and SOMANI A. K. A survey of techniques for optimizing transformer inference. Journal of Systems Architecture, 2023, 144: 102990, https://doi.org/10.1016/j.sysarc.2023.102990

CORRÊA N. K., FALK S., FATIMAH S., SEN A., and DE OLIVEIRA N. TeenyTinyLlama: Open-source tiny language models trained in Brazilian Portuguese. Machine Learning with Applications, 2024, 16: 100558, https://doi.org/10.1016/j.mlwa.2024.100558

TAYLOR N., GHOSE U., ROHANIAN O., NOURIBORJI M., KORMILITZIN A., CLIFTON D. A., and NEVADO-HOLGADO A. Efficiency at scale: Investigating the performance of diminutive language models in clinical task. Artificial Intelligence in Medicine, 2024, 157: 103002, https://doi.org/10.1016/j.artmed.2024.103002

CHIU I. C., and HUNG M. Finance-specific large language models: Advancing sentiment analysis and return prediction with LLaMA 2. Pacific-Basin Finance Journal, 2025, 90: 102632, https://doi.org/10.1016/j.pacfin.2024.102632

ZHU J., REN Y., ZHOU W., XU J., NIU Z., ZHAN S., and MA W. LLM-mambaformer: Integrating mamba and transformer for crystalline solids properties prediction. Materials Today Communications, 2025, 44: 112029, https://doi.org/10.1016/j.mtcomm.2025.112029

ZAHEDIFAR R., BAGHSHAH M. S., and TAHERI A. LLM-controller: Dynamic robot control adaptation using large language models. Robotics and Autonomous Systems, 2025, 186: 104913, https://doi.org/10.1016/j.robot.2024.104913

OLIVEIRA F., COSTA D. G., ASSIS F., and SILVA I. Internet of Intelligent Things: A convergence of embedded systems, edge computing and machine learning. Internet of Things, 2024, 26: 101153, https://doi.org/10.1016/j.iot.2024.101153

LAZZARONI L., BELLOTTI F., and BERTA R. An embedded end-to-end voice assistant. Engineering Applications of Artificial Intelligence, 2024, 136(Part B): 108998, https://doi.org/10.1016/j.engappai.2024.108998

MITTAL S. A Survey on optimized implementation of deep learning models on the NVIDIA Jetson platform. Journal of Systems Architecture, 2019, 97: 428-442, https://doi.org/10.1016/j.sysarc.2019.01.011

NOCUA F., PÉREZ-HOLGUÍN W. J., and PARDO-BEAINY C. Urban traffic monitoring based on deep learning on an embedded GPU. Expert Systems with Applications, 2025, 273: 126847, https://doi.org/10.1016/j.eswa.2025.126847

CHEN Q., and JIANG X. A portable real-time concrete bridge damage detection system. Measurement, 2025, 240: 115536, https://doi.org/10.1016/j.measurement.2024.115536

KORTLI Y., GABSI S., VOON L. F. L. Y., JRIDI M., MERZOUGUI M., and ATRI, M. Deep embedded hybrid CNN–LSTM network for lane detection on NVIDIA Jetson Xavier NX. Knowledge-Based Systems, 2022, 240: 107941, https://doi.org/10.1016/j.knosys.2021.107941


Refbacks

  • There are currently no refbacks.