OPTIMIZING LARGE LANGUAGE MODELS FOR LIGHT AUTOMATION: FINE-TUNING FOR CODE-MIXED NEPALI-ENGLISH COMMAND INTERPRETATION
This repository contains the implementation, dataset, fine-tuning scripts, and evaluation metrics for the thesis:
"OPTIMIZING LARGE LANGUAGE MODELS FOR LIGHT AUTOMATION: FINE-TUNING FOR CODE-MIXED NEPALI-ENGLISH COMMAND INTERPRETATION "
This research explores how low-parameter LLMs (Llama 3.2 1B) can be fine-tuned and optimized to generate structured YAML automation commands, enabling efficient command processing in low-resource environments.
The project involves:
✅ Synthetic dataset generation using OpenAI API.
✅ Fine-tuning Llama 3.2 (1B) with LoRA & QLoRA for efficiency.
✅ Optimized training using Unsloth on Lightning.AI cloud GPUs.
✅ Structured YAML output generation for automation tasks.
✅ Benchmarking accuracy (Exact Match & Partial Match) and inference efficiency.
- A synthetic dataset of ~11,000 Nepali-English automation commands was generated using OpenAI API.
- Data balancing techniques ensured equal representation of intents, rooms, and actions.
- The dataset was split into 70% training, 15% validation, and 15% test sets, maintaining class balance.
The model was fine-tuned using LoRA and QLoRA to optimize efficiency:
- Base Model: Llama 3.2 (1B)
- Training Framework: Unsloth (optimized for speed & efficiency)
- Hardware: Lightning.AI cloud GPUs (L40S, A10, T4)
- Training Strategy: LoRA applied to transformer layers for low-memory adaptation
- Optimizer: AdamW with learning rate tuning using Optuna
Training Performance:
- The instruct-tuned model exhibited faster convergence and better validation loss than the base model.
- Fine-tuned YAML outputs showed higher correctness compared to pre-trained models.
1️⃣ Accuracy Metrics:
- Exact Match Accuracy: Measures if the YAML output exactly matches the expected response.
- Partial Match Accuracy: Checks how many key-value pairs in YAML outputs are correctly generated.
- 85% exact match, 87% partial match accuracy
2️⃣ Inference Efficiency:
- Benchmarking performed on CPU (Ryzen 7 7700 & Ryzen 5 7530U).
- Latency & RAM usage tracked to measure real-world feasibility.
- Found low latency (0.2s inference time)
- Fine-tuned Llama 3.2 (1B) can generate structured YAML outputs with high accuracy, making it viable for structured automation.
- The instruct-tuned model performed better in validation loss stability and accuracy than the base fine-tuned model.
- Quantization reduced model size, making it feasible for low-resource hardware deployment.