Tiny Large Language Models in Embedded Nvidia Portable Hardware Comparative Analysis
Abstract
This study leverages the NVIDIA platform, a hardware with constrained computational resources, to evaluate the real-time performance of various small language models as local assistants, functioning without Internet access. The research employed a novel four-phase methodology, beginning with the selection of small language models for evaluation, followed by the design of a reproducible test protocol for future studies. This protocol incorporates both quantitative and qualitative assessment criteria, including latency, power consumption, memory usage, accuracy, creativity, narrative coherence, structural adherence, and Spanish performance. In the third phase, the selected models were locally embedded and executed on the hardware to compare their respective performances. Finally, the suitability of each model for real-time applications was analyzed, leading to the development of a comprehensive test protocol. The findings indicate that, of the nine models evaluated, Qwen2.5 with 3 billion parameters emerges as the optimal choice when resources allow, while Qwen2.5 with 0.5 billion parameters provides a viable alternative for scenarios with severe resource limitations.
Keywords: large language models, tiny LLM, Nvidia jetson Xavier hardware, Qwen2.5 model, Phi3.5 model, Smollm2 model, Mistral model.
Full Text:
PDFReferences
LIU Y., HE H., HAN T., ZHANG X., LIU M., TIAN J., ZHANG Y., WANG J., GAO X., ZHONG T., PAN Y., XU S., WU Z., LIU Z., ZHANG X., ZHANG S., HU X., ZHANG T., QIANG N., LIU T., and GE B. Understanding LLMs: A comprehensive overview from training to inference. Neurocomputing, 2025, 620: 129190, https://doi.org/10.1016/j.neucom.2024.129190
YAO Y., DUAN J., XU K., CAI Y., SUN Z., and ZHANG Y. A survey on large language model (LLM) security and privacy: The Good, The Bad, and The Ugly. High-Confidence Computing, 2024, 4(2): 100211, https://doi.org/10.1016/j.hcc.2024.100211
B. YAN, K. LI, M. XU, Y. DONG, Y. ZHANG, Z. REN, and X. CHENG. On protecting the data privacy of Large Language Models (LLMs) and LLM agents: A literature review. High-Confidence Computing, 2025, 100300, https://doi.org/10.1016/j.hcc.2025.100300
P. Y. P. CHAN, J. KEUNG, and Z. YANG. Effectiveness of symmetric metamorphic relations on validating the stability of code generation LLM. Journal of Systems and Software, 2025, 222: 112330, https://doi.org/10.1016/j.jss.2024.112330
Z. W. PETZEL, and L. SOWERBY, Prejudiced interactions with large language models (LLMs) reduce trustworthiness and behavioral intentions among members of stigmatized groups, Computers in Human Behavior, 2025, 165: 108563, https://doi.org/10.1016/j.chb.2025.108563
RAPP A., DI LODOVICO C., and DI CARO L. How do people react to ChatGPT's unpredictable behavior? Anthropomorphism, uncanniness, and fear of AI: A qualitative study on individuals’ perceptions and understandings of LLMs’ nonsensical hallucinations. International Journal of Human-Computer Studies, 2025, 198: 103471, https://doi.org/10.1016/j.ijhcs.2025.103471
M. A. FERRAG, F. ALWAHEDI, A. BATTAH, B. CHERIF, A. MECHRI, N. TIHANYI, T. BISZTRAY, and M. DEBBAH. Generative AI in Cybersecurity: A Comprehensive Review of LLM Applications and Vulnerabilities. Internet of Things and Cyber-Physical Systems, 2025, https://doi.org/10.1016/j.iotcps.2025.01.001
ZHANG J., CLAIRMONT C., QUE X., LI W., CHEN W., LI C., and MA X. Streamlining geoscience data analysis with an LLM-driven workflow. Applied Computing and Geosciences, 2025, 25: 100218, https://doi.org/10.1016/j.acags.2024.100218
RHOMRASI L., AHSINI Y., IGUALDE-SÁEZ A., VINUESA R., HOYAS S., GARCÍA-SABATER J. P., FULLANA-I-ALFONSO M. J., and CONEJERO J. A. LLM performance on mathematical reasoning in Catalan language. Results in Engineering, 2025, 25: 104366, https://doi.org/10.1016/j.rineng.2025.104366
SUN J., and LV Z., Zero-shot detection of LLM-generated text via text reorder. Neurocomputing, 2025, 631: 129829, https://doi.org/10.1016/j.neucom.2025.129829
LIN L., ZHANG S., FU S., and LIU Y. FD-LLM: Large language model for fault diagnosis of complex equipment. Advanced Engineering Informatics, 2025, 65(Part A): 103208, https://doi.org/10.1016/j.aei.2025.103208
JIN H., WANG J., SHENG J., WU Y., CHEN J., WANG Y., and LIU J. WiseEDA: LLMs in RF Circuit Design. Microelectronics Journal, 2025, 158: 106607, https://doi.org/10.1016/j.mejo.2025.106607
LIÉVIN V., HOTHER C. E., MOTZFELDT A. G., and WINTHER O., Can large language models reason about medical questions? Patterns, 2024, 5(3): 100943, https://doi.org/10.1016/j.patter.2024.100943
XU Z., SONG T., and LEE Y. Confronting verbalized uncertainty: Understanding how LLM’s verbalized uncertainty influences users in AI-assisted decision-making. International Journal of Human-Computer Studies, 2025, 197: 103455, https://doi.org/10.1016/j.ijhcs.2025.103455
SOLIMAN G., ZAKI H., and KILANY M. A comparative analysis of encoder only and decoder only models for challenging LLM-generated STEM MCQs using a self-evaluation approach. Natural Language Processing Journal, 2025, 10: 100131, https://doi.org/10.1016/j.nlp.2025.100131
CHITTY-VENKATA K. T., MITTAL S., EMANI M., VISHWANATH V., and SOMANI A. K. A survey of techniques for optimizing transformer inference. Journal of Systems Architecture, 2023, 144: 102990, https://doi.org/10.1016/j.sysarc.2023.102990
CORRÊA N. K., FALK S., FATIMAH S., SEN A., and DE OLIVEIRA N. TeenyTinyLlama: Open-source tiny language models trained in Brazilian Portuguese. Machine Learning with Applications, 2024, 16: 100558, https://doi.org/10.1016/j.mlwa.2024.100558
TAYLOR N., GHOSE U., ROHANIAN O., NOURIBORJI M., KORMILITZIN A., CLIFTON D. A., and NEVADO-HOLGADO A. Efficiency at scale: Investigating the performance of diminutive language models in clinical task. Artificial Intelligence in Medicine, 2024, 157: 103002, https://doi.org/10.1016/j.artmed.2024.103002
CHIU I. C., and HUNG M. Finance-specific large language models: Advancing sentiment analysis and return prediction with LLaMA 2. Pacific-Basin Finance Journal, 2025, 90: 102632, https://doi.org/10.1016/j.pacfin.2024.102632
ZHU J., REN Y., ZHOU W., XU J., NIU Z., ZHAN S., and MA W. LLM-mambaformer: Integrating mamba and transformer for crystalline solids properties prediction. Materials Today Communications, 2025, 44: 112029, https://doi.org/10.1016/j.mtcomm.2025.112029
ZAHEDIFAR R., BAGHSHAH M. S., and TAHERI A. LLM-controller: Dynamic robot control adaptation using large language models. Robotics and Autonomous Systems, 2025, 186: 104913, https://doi.org/10.1016/j.robot.2024.104913
OLIVEIRA F., COSTA D. G., ASSIS F., and SILVA I. Internet of Intelligent Things: A convergence of embedded systems, edge computing and machine learning. Internet of Things, 2024, 26: 101153, https://doi.org/10.1016/j.iot.2024.101153
LAZZARONI L., BELLOTTI F., and BERTA R. An embedded end-to-end voice assistant. Engineering Applications of Artificial Intelligence, 2024, 136(Part B): 108998, https://doi.org/10.1016/j.engappai.2024.108998
MITTAL S. A Survey on optimized implementation of deep learning models on the NVIDIA Jetson platform. Journal of Systems Architecture, 2019, 97: 428-442, https://doi.org/10.1016/j.sysarc.2019.01.011
NOCUA F., PÉREZ-HOLGUÍN W. J., and PARDO-BEAINY C. Urban traffic monitoring based on deep learning on an embedded GPU. Expert Systems with Applications, 2025, 273: 126847, https://doi.org/10.1016/j.eswa.2025.126847
CHEN Q., and JIANG X. A portable real-time concrete bridge damage detection system. Measurement, 2025, 240: 115536, https://doi.org/10.1016/j.measurement.2024.115536
KORTLI Y., GABSI S., VOON L. F. L. Y., JRIDI M., MERZOUGUI M., and ATRI, M. Deep embedded hybrid CNN–LSTM network for lane detection on NVIDIA Jetson Xavier NX. Knowledge-Based Systems, 2022, 240: 107941, https://doi.org/10.1016/j.knosys.2021.107941
Refbacks
- There are currently no refbacks.