## Article Body Matt Mireles released an open-source tool that fine-tunes Google's Gemma 4 and Gemma 3n models on Apple Silicon, even as the same technology is now accessible on iPhone via Google's AI Edge Gallery app. The Gemma Multimodal Fine-Tuner handles text, images, and audio using LoRA adaptation, and it's the only Apple-Silicon-native solution that supports audio fine-tuning.
Use cases include domain-specific speech recognition for medical or legal work, such as local transcription tools like Ownscribe, where off-the-shelf models mishear jargon, plus computer vision tasks like analyzing receipts or manufacturing defects. You could also train UI agents that understand screenshots. The whole pipeline runs locally, so sensitive data and model weights never leave your machine. The code is on GitHub.