LLM Fine-Tuning

DevSecOps-5090 — GPU Training Pipeline on Kubernetes

Challenge

Fine-tuning Large Language Models typically requires cloud GPU instances (expensive) or complex local setups. Need a production-ready, self-hosted training pipeline that:

Runs on local RTX 5090 (32GB VRAM)
Deploys via Kubernetes (k3s homelab)
Uses pre-built images (no runtime pip installs)
Supports QLoRA for memory efficiency

Solution Architecture

Pipeline Overview

┌─────────────────────────────────────────────────────────────┐
│                    K3s Cluster (Homelab)                     │
├─────────────────────────────────────────────────────────────┤
│  ┌─────────────────────────────────────────────────────┐   │
│  │              Training Pod (GPU)                      │   │
│  │  ┌─────────────────┐  ┌─────────────────────────┐   │   │
│  │  │ Init Containers │  │    Main Container       │   │   │
│  │  │ - Verify GPU    │  │ - qwen_qlora_trainer.py │   │   │
│  │  │ - Check deps    │  │ - HuggingFace ecosystem │   │   │
│  │  │ - Mount PVCs    │  │ - bitsandbytes (4-bit)  │   │   │
│  │  └─────────────────┘  └─────────────────────────┘   │   │
│  │                                                      │   │
│  │  ┌─────────────────────────────────────────────┐    │   │
│  │  │              Sidecar                         │    │   │
│  │  │         metrics-exporter                     │    │   │
│  │  └─────────────────────────────────────────────┘    │   │
│  └─────────────────────────────────────────────────────┘   │
│                                                             │
│  Volumes:                                                   │
│  ├── /mnt/models      (RO) - Model cache                   │
│  ├── /mnt/data        (RO) - Training data                 │
│  ├── /mnt/checkpoints (RW) - Output checkpoints            │
│  └── /mnt/training-logs (RW) - Logs + TensorBoard          │
└─────────────────────────────────────────────────────────────┘

Pre-Built Training Image

FROM pytorch/pytorch:2.6.0-cuda12.4-cudnn9-runtime

# Pre-install all dependencies (no runtime downloads)
RUN pip install --no-cache-dir \
    transformers==4.47.0 \
    peft==0.14.0 \
    trl==0.13.0 \
    bitsandbytes==0.45.0 \
    datasets==3.2.0 \
    accelerate==1.2.1 \
    safetensors \
    sentencepiece \
    protobuf

Result: 11.4 GB image, ~2 min boot (vs. 15+ min with runtime pip)

April 29, 2026 •

LLM Fine-tuning QLoRA GPU Kubernetes Self-Hosted ML Infrastructure

Universal Knowledge Extractor — LLM Training Data Pipeline

Challenge

Fine-tuning LLMs on domain-specific knowledge requires structured, high-quality instruction-response pairs. Manual curation doesn’t scale, and raw content from code repos, docs, and social media needs significant preprocessing before it’s usable for training.

Build a pipeline that:

Extracts knowledge from diverse sources (code, docs, Telegram, LinkedIn)
Automatically discovers content taxonomy
Produces ChatML-formatted JSONL ready for Axolotl/LLaMA-Factory
Runs fully offline with local Ollama

Solution Architecture

Pipeline Flow

┌─────────────────────────────────────────────────────────────────┐
│                      Content Sources                             │
├────────────┬────────────┬────────────┬──────────────────────────┤
│ Filesystem │   GitHub   │  Telegram  │        LinkedIn          │
│  (.md, .py)│  (repos)   │ (channels) │       (exports)          │
└─────┬──────┴─────┬──────┴─────┬──────┴────────────┬─────────────┘
      │            │            │                   │
      └────────────┴────────────┴───────────────────┘
                            │
                            ▼
              ┌──────────────────────────┐
              │    Taxonomy Discovery    │
              │  (Unsupervised ML)       │
              └────────────┬─────────────┘
                           │
                           ▼
              ┌──────────────────────────┐
              │   LLM Extraction         │
              │  (Ollama qwen3.5:35b)    │
              │  → Instruction-Response  │
              └────────────┬─────────────┘
                           │
                           ▼
              ┌──────────────────────────┐
              │   Quality Assurance      │
              │  - Schema validation     │
              │  - Token length check    │
              │  - Language detection    │
              │  - Credential scanning   │
              │  - Semantic dedup        │
              └────────────┬─────────────┘
                           │
                           ▼
              ┌──────────────────────────┐
              │   Data Augmentation      │
              │  - Paraphrase            │
              │  - Reformulation         │
              └────────────┬─────────────┘
                           │
                           ▼
         output/training-data.jsonl (ChatML)

Multi-Source Support

Source	Adapter	Content Types
Filesystem	`fs`	Markdown, Python, YAML, JSON
GitHub	`github`	Repos, READMEs, code, issues
Telegram	`telegram`	Channel messages, media captions
LinkedIn	`linkedin`	Profile exports, posts, articles

Key Features

1. Automatic Taxonomy Discovery

# No manual topic lists required
taxonomy:
  method: unsupervised
  algorithm: kmeans  # or hdbscan
  num_clusters: auto  # discovers optimal count

Uses embedding-based clustering to automatically categorize content into topics.

April 16, 2026 •

LLM Fine-tuning Training Data ChatML Ollama Data Pipeline NLP