1. 執行摘要 (Executive Summary)

傳統物聯網 (IoT) 家電高度依賴雲端運算，常面臨網路延遲、隱私外洩風險及指令缺乏彈性等痛點。本專案旨在於資源受限之邊緣設備（Raspberry Pi）上，建構具備自然語意理解 (NLP) 多模態感知（視覺/語音）之實體 AI 代理 (類Physical AI Agent)。透過模型量化技術、提示詞工程 (Prompt Engineering) 以及混合式防護(Guardrails)，本專案成功將大語言模型 (LLM) 推論延遲自 10 秒壓縮至 2 秒內，並透過系統常駐完成無頭模式 (Headless) 部署，達成商用級別的插電即用標準[1][2]。

此專案最主要為給同學參考上課所學之AI應用，使用ChatBot拓寬思考、Vibe Coding將思考以軟體面實踐、整合後使用Notebooklm介紹、生成簡報及介紹小卡等過程...等於AIOT領域落地的實際案例，由於過程使用眾多的AI應用、無法於短短一篇文章完全解釋清楚而感到抱歉，也感謝我的同學施芊妤及阮玉映在此專案的想法與報告之貢獻。

2. 系統架構設計 (System Architecture)

本系統採微服務與解耦設計，分為三大子系統以確保運行穩定性，概略架構如下圖所示(硬體平台及軟體有文字誤差為實驗條件設定所致)：

感知輸入層 (Perception Layer)：
- 語音介面： 透過遙控器實體按鈕觸發電扇硬體中斷，啟動麥克風陣列進行非同步收音。
- 視覺介面： 透過YOLO物件偵測與知識蒸餾方法、將熱成像感測器應用於即時計算環境中人體目標之相對誤差座標。
邊緣運算決策層 (Edge Edge-AI Processing Layer)：
- 語音辨識 (ASR)： 採用 Faster-Whisper (tiny) 進行離線音訊轉文字[3]。
- 語意中樞 (LLM)： 部署 Llama.cpp 驅動之 Qwen2.5-0.5B 量化模型[9]，解析自然語言並映射至標準控制碼[11][12][12a]。
硬體致動與回饋層 (Actuation & Feedback Layer)：
- 步進馬達 (Stepper Motor) 閉環控制與繼電器 (Relay) 電源管理。
- 內建 Flask 輕量級影像串流，提供硬體本地的 Web UI 遠端狀態監控(SSH)。

3. 核心技術與最佳化工程 (Core Optimizations)

本專案之核心技術價值在於克服邊緣運算之硬體限制，具體實作以下三項優化工程[7][8][13][14]：

3.1 運算資源最佳化 (Resource Optimization & Latency Reduction)

痛點： LLM 在邊緣裝置（本案件使用樹梅派3_1GB）推論極為緩慢（> 10s），且易導致記憶體溢出 (OOM) 崩潰。
實作方案：
- 導入 GGUF 4-bit 量化技術，大幅限縮模型 RAM 佔用 (<300MB)[10][10a][10b]。
- 嚴格控管上下文窗口 (n_ctx=512) 並指定執行緒配置 (n_threads=4)，調整 CPU 算力。
- 提示詞壓縮 (Prompt Compression)： 刪減冗餘提示[4]，並採用 Few-shot Learning 規範的JSON輸出格式，成功將反應時間壓縮至 < 0.5 秒。

3.2 混合式決策防護網 (Hybrid Guardrails Framework)

痛點： Edge LLM 易受語音辨識雜訊干擾，產生「錨點效應 (Anchor Bias)」與幻覺，進而輸出危險的硬體控制碼、有變成提示詞注入攻擊等風險[5][6]。
實作方案： 捨棄對模型的唯一硬編碼，建立語意解析 + 確定性規則 (Deterministic Rules)的雙層防護機制。當系統接收無效雜訊（如辨識錯誤、惡意語音輸入）時，底層 Python 規則將強制攔截異常意圖並覆寫為 IGNORE，確保硬體運作之安全性。

3.3 產品化與抗干擾設計 (Productization & EMI Resistance)

痛點： 實體按鈕易受馬達運轉及環境電磁波干擾導致系統誤觸發。
實作方案：
- 硬體層： 啟用 GPIO 內部下拉電阻 (PUD_DOWN)，並要求 3.3V 實體高電位觸發。
- 軟體層： 引入雙重時序確認與 bouncetime 防彈跳演算法過濾高頻雜訊。
- 部署層： 將部分 Python 主程式封裝為 Linux Systemd 服務，實現無螢幕、無鍵盤之 Headless 產品化。

4. 效能驗證與小結論 (Performance Evaluation & Conclusion)

本專案讓離線 Edge LLM應用於智慧家電之可行性提高。系統能在不依賴任何外部網路連線的條件下，實現精準的人體視覺追蹤與流暢的自然語言互動。透過知識蒸餾 (Teacher-Student) 概念輕量化視覺模型[15][16]，以及 LLM 提示詞工程的精準調校，系統在有限的 ARM 架構算力下達成了效能與準確率的最佳平衡。

5. 未來發展藍圖 (Future Work)

基於現有之微服務架構，下一階段重點將著重於生態系整合 (Ecosystem Integration)：

導入 MQTT 通訊協定： 將本設備轉型為標準之 IoT 邊緣節點、透過長期監測可提供能源資訊給家庭主機作為報告改善、達到SDGs、ESG等環境保護目標。
Home Assistant (HA) 系統連動： 透過非同步通訊將風扇狀態（如：當前環境人數、決策狀態）推播至 HA 伺服器，實現與照明、空調等其他智慧家庭設備之全域自動化聯動控制。
裝置於醫療領域之應用：該裝置使用紅外線鏡頭並配合adafruit套件成像，此技術可確保不使用可見光模組的前提提升資訊安全，目前也有醫療院所嘗試引進紅外線鏡頭輔助開刀，不失為一個不錯的技術參考。

6.總結 (Conclusion)

當我在回顧並撰寫這個專案，我深刻體會到未來的軟體工程師不再只是專門寫Code的職位。

在上課我們學習了強大的網頁版 AI 應用作為雲端開發輔助（課程講述的n8n、Notebooklm、PowerAPPs）、並馬上可以把一些想法落地報告的同時，也讓我想起未來會有更恐怖的本地 AI agent 應用正在落地（OpenClaw、NemoClaw）。

所以我認為學習AI的真正價值在於培養 "定義問題"、"串接自動化工作流以改進效率" 以及 "在現實限制下做出最佳化工程"的實踐能力。

從去年11月點子發想到解決硬體雜訊，再到克服 LLM 幻覺，最後整合落地，AI 不僅是這個智慧風扇的大腦，更是我開發過程中最不可或缺的導師與夥伴。希望這個實體 AI Agent 的落地經驗，能引出大家一些化靈感為現實的新想法。

7.展示 (Demo)

https://github.com/a0935951152-droid/AI-Smart-Fan

8.參考文獻 (References)

[1] Liu, Y., Chen, W., Bai, Y., Li, G., Gao, W., & Lin, L. (2024). Aligning cyber space with physical world: A comprehensive survey on Embodied AI. arXiv preprint arXiv:2407.06886.

[2] Sun, F., Chen, R., Ji, T., et al. (2024). A comprehensive survey on embodied intelligence: Advancements, challenges, and future perspectives. CAAI Artificial Intelligence Research, 3, 9150042.

[3] Babaiasl, M., et al. (2024). Deployment of large language models to control mobile robots at the edge. arXiv preprint arXiv:2405.17670.

[4] Kwon, T., Di Palo, N., & Johns, E. (2024). Language models as zero-shot trajectory generators. IEEE Robotics and Automation Letters, 9(7).

[5] Wang, Y.-J., et al. (2023). Prompt a robot to walk with large language models. arXiv preprint arXiv:2309.09969.

[6] Zhou, Y., Muresanu, A. I., Han, Z., Paster, K., Pitis, S., Chan, H., & Ba, J. (2023). Large language models are human-level prompt engineers. In Proceedings of ICLR 2023. arXiv:2211.01910.

[7] Borzunov, A., Ryabinin, M., Dettmers, T., Lhoest, Q., Saulnier, L., Diskin, M., ... & Wolf, T. (2024). A review on edge large language models: Design, execution, and applications. ACM Computing Surveys.

[8] Gurunathan, T. S., Raza, M. S., & Janakiraman, A. K. (2025). Edge LLMs for real-time contextual understanding with ground robots. In AAAI Spring Symposium.

[9] Qwen Team. (2024). Qwen2.5: A party of foundation models. Alibaba Cloud. 「Technical Report」https://qwenlm.github.io/blog/qwen2.5/

[10] Gerganov, G., et al. (2023). llama.cpp: Efficient LLM inference in C/C++. GitHub Repository. https://github.com/ggml-org/llama.cpp

[10a]Iakovenko, M., & Dupont, S. (2025). Which quantization should I use? A unified evaluation of llama.cpp quantization on Llama-3.1-8B-Instruct. arXiv preprint arXiv:2601.14277.

[10b]Guo, Z., Li, Y., et al. (2024). Faster and lighter LLMs: A survey on current challenges and way forward. arXiv preprint arXiv:2402.01799.

[11] Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., & Sutskever, I. (2023). Robust speech recognition via large-scale weak supervision. In Proceedings of the 40th International Conference on Machine Learning (ICML), PMLR 202:28492–28518.

[12] SYSTRAN. (2023). faster-whisper: Faster Whisper transcription with CTranslate2. GitHub Repository. https://github.com/SYSTRAN/faster-whisper

[12a]Zheng, Y., Chen, Y., Qian, B., Shi, X., Shu, Y., & Chen, J. (2025). Performance evaluation of Whisper-series speech transcription models on Raspberry Pi. In Proceedings of the 10th ACM/IEEE Symposium on Edge Computing (SEC 2025). ACM.

[13] Melexis N.V. (2019). MLX90640 far infrared thermal sensor array datasheet. Melexis. https://www.melexis.com/en/product/mlx90640/far-infrared-thermal-sensor-array

[14] Huang, J., Yang, X., Jin, H., & Nguyen, T. (2022). Detection of moving objects using thermal imaging sensors for occupancy estimation. Smart Cities, 5(1), 202–220.

[15] Lupión, M., Polo-Rodríguez, A., Ortigosa, P. M., & Medina-Quero, J. (2023). ThermalYOLO: A person detection neural network in thermal images for smart environments. In Proceedings of UCAmI 2022, Lecture Notes in Networks and Systems, vol. 594. Springer.

[16] Yildiz, T., Demirci, M. F., & Kaya, I. (2023). Human detection in thermal images using YOLOv8 for search and rescue missions. In 2023 IEEE 7th International Conference on Advances in Biomedical Engineering (ICABME).

[17] Liang, J., Huang, W., Xia, F., Xu, P., Hausman, K., Ichter, B., ... & Zeng, A. (2023). Code as policies: Language model programs for embodied control. In 2023 IEEE International Conference on Robotics and Automation (ICRA), 9493–9500.

[18] Sun, F., et al. (2024). Embodied AI: A survey on the evolution from perceptive to behavioral intelligence. SmartBot – Wiley Online Library. https://doi.org/10.1002/smb2.70003

Edge AI 智慧跟隨風扇：離線語音、自然語言處理與紅外線追蹤系統整合後落地與最佳化－AI應用實務探討