AI/ML

Agentic AI Governance — Control Plane for AI Agents

Challenge

Autonomous AI agents are executing tool calls — database queries, API requests, file operations — with minimal human oversight. Enterprises deploying agents face regulatory requirements (EU AI Act Art. 12-14, Singapore MGF) for human oversight, audit trails, and authorization controls. Existing agent frameworks (LangChain, AutoGPT, CrewAI) have no built-in governance layer.

Build a control-plane overlay that intercepts, authorizes, logs, and audits every tool call made by an AI agent — without modifying the agent or tool code.

May 7, 2026 •

AI Governance EU AI Act Agent Security OWASP LLM Human-in-the-Loop Audit Trail

EU AI Act Doc Generator — Automated Compliance Artifacts

Challenge

The EU AI Act (Regulation 2024/1689) requires AI system providers to produce extensive documentation: risk classification per Annex III, technical documentation per Annex IV (model cards, risk assessments, data governance records, human oversight procedures), and conformity assessment evidence. Manual documentation is time-consuming, inconsistent, and difficult to maintain over the 7-year retention period.

Build a platform that automates the generation of EU AI Act-compliant documentation artifacts.

May 7, 2026 •

EU AI Act AI Documentation Model Cards Risk Assessment Conformity Assessment Regulatory Compliance

Jobs Finder — AI-Powered Job Search Pipeline

Challenge

Job searching across multiple platforms is time-consuming. Telegram channels post hundreds of vacancies daily, LinkedIn requires manual browsing, and HH.ru (Russia’s Indeed) needs separate attention. Manually reviewing all postings and assessing fit is inefficient.

Build a pipeline that:

Aggregates jobs from Telegram, LinkedIn, and HH.ru
Ranks every posting against your CV using AI
Runs entirely locally (no cloud API costs, no data leakage)
Supports multiple job search profiles (DevSecOps, Data Analyst, PM)

Solution Architecture

Pipeline Flow

┌─────────────────────────────────────────────────────────────┐
│                    Data Sources                              │
├──────────────┬──────────────────┬───────────────────────────┤
│   Telegram   │     LinkedIn     │          HH.ru            │
│ (13 channels)│ (keyword matrix) │    (public REST API)      │
└──────┬───────┴────────┬─────────┴───────────────┬───────────┘
       │                │                          │
       └────────────────┼──────────────────────────┘
                        ▼
              ┌─────────────────────┐
              │    Deduplication    │
              │  (cross-source)     │
              └──────────┬──────────┘
                         │
                         ▼
              ┌─────────────────────┐
              │    AI Ranking       │
              │ Ollama qwen3.5:35b  │
              │ (batch mode: 10/call)│
              └──────────┬──────────┘
                         │
                         ▼
              ┌─────────────────────┐
              │    Filter & Report  │
              │  (threshold: 78+)   │
              └─────────────────────┘
                         │
                         ▼
              outputs/<date>/ranked/report.md

Multi-Profile Support

# profiles/devsecops.yaml
cv_path: "info/CV/devsecops/gantman_2026_compact.md"
linkedin:
  keywords: ["DevSecOps", "Security Architect", "SRE", "CISO"]
  geo: ["Worldwide", "European Union", "Israel"]
telegram:
  channels: ["devops_jobs", "sre_jobs", "security_jobs"]
filters:
  min_salary_usd: 9000
  rank_threshold: 78

Key Features

1. Telegram Channel Scraping

Public channel scraping (no bot token needed)
13 default DevOps/Security channels
Configurable per profile
Rate-limited with polite delays

2. LinkedIn Job Search

Selenium with persistent Chrome profile
Keyword × geo matrix search
Handles pagination (up to 40 pages)
Extracts: title, company, salary, location, description

3. HH.ru Integration

Public REST API (no auth)
Area-based search (Russia, Belarus, Georgia, Israel)
Experience level filtering
Structured vacancy data

4. AI Ranking (Batch Mode)

# Batch processing: 10 jobs per LLM call
prompt = f"""
CV: {cv_markdown}

Jobs: {json.dumps(batch_of_10_jobs)}

Score each job 0-100 for fit. Return JSON:
{{"results": [{{"job_id": "...", "score": 85, "evidence_confidence": 0.9}}]}}
"""

# Amortizes model-load cost across batch
response = await ollama.generate(prompt, model="qwen3.5:35b")

5. State Management

-- ~/.cache/jobs-finder/state.sqlite3
CREATE TABLE seen_jobs (
    job_id TEXT PRIMARY KEY,
    source TEXT,
    first_seen TIMESTAMP,
    score INTEGER,
    evidence_confidence REAL
);

Cross-run deduplication
Score history tracking
Prevents re-ranking known jobs

Configuration

Environment Variables

# Ollama
JOBS_OLLAMA_HOST=http://192.168.2.2:11434
JOBS_OLLAMA_MODEL=qwen3.5:35b
JOBS_OLLAMA_NUM_CTX=16384

# Filters
JOBS_MIN_SALARY_USD=9000
JOBS_RANK_THRESHOLD=78
JOBS_REPORT_TOP_N=10
JOBS_MAX_POSTING_AGE_DAYS=21

# Sources
JOBS_TELEGRAM_MAX_MESSAGES=500
JOBS_LINKEDIN_HEADLESS=false
JOBS_HH_MAX_PAGES_PER_AREA=20

Usage

Run Full Pipeline

# Using profile
python -m jobs_finder pipeline --profile devsecops -v

# Skip specific sources
python -m jobs_finder pipeline --skip-linkedin --skip-hh -v

# Force re-rank everything
python -m jobs_finder pipeline --no-state

Output

outputs/2026-05-15/
├── telegram.jsonl      # Raw scraped posts
├── linkedin.jsonl      # Raw scraped jobs
├── hh.jsonl           # Raw vacancies
└── ranked/
    ├── ranked.jsonl   # All jobs with scores
    └── report.md      # Top-N markdown report

Results & Benefits

Efficiency Gains

Manual Search:
├── Telegram: ~30 min/day reading channels
├── LinkedIn: ~45 min/day browsing + filtering
├── HH.ru: ~20 min/day
└── Total: ~95 min/day

Automated Pipeline:
├── Scraping: ~5 min (background)
├── Ranking: ~10 min (batch AI)
├── Review: ~10 min (top 10 only)
└── Total: ~25 min/day

Savings: 70 min/day, 70%+ reduction

Privacy Benefits

No job data sent to cloud APIs
CV stays local
LinkedIn profile not exposed to third-party services
Full control over data retention

Architecture Decisions

Ollama batch mode: 10 jobs per call amortizes model-load latency
SQLite state DB: Simple, portable, no server needed
Chrome persistent profile: Maintains LinkedIn login across runs
Multi-profile YAML: Easy to switch between job searches
Confidence scoring: Filter out low-evidence matches (< 0.5)

May 6, 2026 •

Job Search Web Scraping AI Ranking Local LLM Automation Privacy-First

DevSecOps-5090 — GPU Training Pipeline on Kubernetes

Challenge

Fine-tuning Large Language Models typically requires cloud GPU instances (expensive) or complex local setups. Need a production-ready, self-hosted training pipeline that:

Runs on local RTX 5090 (32GB VRAM)
Deploys via Kubernetes (k3s homelab)
Uses pre-built images (no runtime pip installs)
Supports QLoRA for memory efficiency

Solution Architecture

Pipeline Overview

┌─────────────────────────────────────────────────────────────┐
│                    K3s Cluster (Homelab)                     │
├─────────────────────────────────────────────────────────────┤
│  ┌─────────────────────────────────────────────────────┐   │
│  │              Training Pod (GPU)                      │   │
│  │  ┌─────────────────┐  ┌─────────────────────────┐   │   │
│  │  │ Init Containers │  │    Main Container       │   │   │
│  │  │ - Verify GPU    │  │ - qwen_qlora_trainer.py │   │   │
│  │  │ - Check deps    │  │ - HuggingFace ecosystem │   │   │
│  │  │ - Mount PVCs    │  │ - bitsandbytes (4-bit)  │   │   │
│  │  └─────────────────┘  └─────────────────────────┘   │   │
│  │                                                      │   │
│  │  ┌─────────────────────────────────────────────┐    │   │
│  │  │              Sidecar                         │    │   │
│  │  │         metrics-exporter                     │    │   │
│  │  └─────────────────────────────────────────────┘    │   │
│  └─────────────────────────────────────────────────────┘   │
│                                                             │
│  Volumes:                                                   │
│  ├── /mnt/models      (RO) - Model cache                   │
│  ├── /mnt/data        (RO) - Training data                 │
│  ├── /mnt/checkpoints (RW) - Output checkpoints            │
│  └── /mnt/training-logs (RW) - Logs + TensorBoard          │
└─────────────────────────────────────────────────────────────┘

Pre-Built Training Image

FROM pytorch/pytorch:2.6.0-cuda12.4-cudnn9-runtime

# Pre-install all dependencies (no runtime downloads)
RUN pip install --no-cache-dir \
    transformers==4.47.0 \
    peft==0.14.0 \
    trl==0.13.0 \
    bitsandbytes==0.45.0 \
    datasets==3.2.0 \
    accelerate==1.2.1 \
    safetensors \
    sentencepiece \
    protobuf

Result: 11.4 GB image, ~2 min boot (vs. 15+ min with runtime pip)

April 29, 2026 •

LLM Fine-tuning QLoRA GPU Kubernetes Self-Hosted ML Infrastructure

Universal Knowledge Extractor — LLM Training Data Pipeline

Challenge

Fine-tuning LLMs on domain-specific knowledge requires structured, high-quality instruction-response pairs. Manual curation doesn’t scale, and raw content from code repos, docs, and social media needs significant preprocessing before it’s usable for training.

Build a pipeline that:

Extracts knowledge from diverse sources (code, docs, Telegram, LinkedIn)
Automatically discovers content taxonomy
Produces ChatML-formatted JSONL ready for Axolotl/LLaMA-Factory
Runs fully offline with local Ollama

Solution Architecture

Pipeline Flow

┌─────────────────────────────────────────────────────────────────┐
│                      Content Sources                             │
├────────────┬────────────┬────────────┬──────────────────────────┤
│ Filesystem │   GitHub   │  Telegram  │        LinkedIn          │
│  (.md, .py)│  (repos)   │ (channels) │       (exports)          │
└─────┬──────┴─────┬──────┴─────┬──────┴────────────┬─────────────┘
      │            │            │                   │
      └────────────┴────────────┴───────────────────┘
                            │
                            ▼
              ┌──────────────────────────┐
              │    Taxonomy Discovery    │
              │  (Unsupervised ML)       │
              └────────────┬─────────────┘
                           │
                           ▼
              ┌──────────────────────────┐
              │   LLM Extraction         │
              │  (Ollama qwen3.5:35b)    │
              │  → Instruction-Response  │
              └────────────┬─────────────┘
                           │
                           ▼
              ┌──────────────────────────┐
              │   Quality Assurance      │
              │  - Schema validation     │
              │  - Token length check    │
              │  - Language detection    │
              │  - Credential scanning   │
              │  - Semantic dedup        │
              └────────────┬─────────────┘
                           │
                           ▼
              ┌──────────────────────────┐
              │   Data Augmentation      │
              │  - Paraphrase            │
              │  - Reformulation         │
              └────────────┬─────────────┘
                           │
                           ▼
         output/training-data.jsonl (ChatML)

Multi-Source Support

Source	Adapter	Content Types
Filesystem	`fs`	Markdown, Python, YAML, JSON
GitHub	`github`	Repos, READMEs, code, issues
Telegram	`telegram`	Channel messages, media captions
LinkedIn	`linkedin`	Profile exports, posts, articles

Key Features

1. Automatic Taxonomy Discovery

# No manual topic lists required
taxonomy:
  method: unsupervised
  algorithm: kmeans  # or hdbscan
  num_clusters: auto  # discovers optimal count

Uses embedding-based clustering to automatically categorize content into topics.

April 16, 2026 •

LLM Fine-tuning Training Data ChatML Ollama Data Pipeline NLP

Esla — AI-Powered Real Estate Platform

Challenge

Build a complete real estate platform for the Belarusian market that enables property search via natural language (Russian/Belarusian), provides automated property valuations, connects buyers with verified realtors, and runs entirely as a Telegram Mini-App — all without recurring cloud API costs.

Solution Architecture

Platform Overview

A modular monolith deployed via Docker Compose with 22 services covering the full real estate lifecycle: search, valuation, listings, realtor marketplace, and admin operations.

April 15, 2026 •

Real Estate NLP AVM Telegram Mini-App Local LLM Zero API Costs

Ollama Decomposition Agent — Intelligent LLM Orchestration

Challenge

Large Language Models have context window limits and increasing latency with prompt size. When analyzing large codebases, documents, or complex multi-part questions, single-shot prompts either exceed context limits or produce slow, unfocused responses.

Build an agent that:

Intelligently splits large prompts into semantic sub-tasks
Executes sub-tasks in parallel for speed
Synthesizes results into coherent responses
Runs entirely on local Ollama (zero API costs)

Solution Architecture

Workflow

User Prompt (large)
    ↓
[Analyze] - Count tokens, identify boundaries
    ↓
[Decide] - Decompose or single call?
    ├─→ Small (<6K tokens) → Single Ollama call → Return
    └─→ Large (>6K tokens) → Decomposition
            ↓
        [Split] - Semantic decomposition (headings, lists, paragraphs)
            ↓
        [Execute] - Parallel sub-task execution (3 concurrent)
            ↓
        [Aggregate] - Result synthesis (auto-strategy selection)
            ↓
        [Return] - Final coherent response

Components

Component	Responsibility
OllamaDecompositionAgent	Main orchestrator, workflow management
TokenManager	Tiktoken counting, chunking with overlap
PromptSplitter	Semantic decomposition at natural boundaries
OllamaClient	Async HTTP with retry/backoff
ResultAggregator	Multi-strategy synthesis

Key Features

1. Intelligent Prompt Decomposition

# Identifies natural breakpoints
sections = splitter.split(prompt)
# → [heading1_content, list_items, paragraph_block, ...]

# Preserves shared context across sub-tasks
for section in sections:
    subtask = f"{shared_context}\n\n{section}"

2. Parallel Execution with Controlled Concurrency

async with asyncio.Semaphore(3):  # Max 3 concurrent
    results = await asyncio.gather(*subtask_calls)

3. Multi-Strategy Aggregation

Sub-task Count	Strategy	Method
1-3	Concatenate	Join all + final synthesis
4-10	Sequential	Summarize each, then synthesize
10+	Hierarchical	Tree-based summarization

4. Performance Optimizations (v2.0)

Optimization	Improvement
Expert identification toggle	25-30% latency reduction
Token count caching	40-60% cache hits, 60-180ms savings
Async file I/O	~160ms improvement
Parallel token counting	140ms+ savings on large prompts
Result caching (LRU)	8-12s+ savings on repeated prompts

Performance Characteristics

Execution Times (DeepSeek-R1:32b)

Prompt Size	Strategy	Sub-tasks	Duration	Tokens
< 6K	Single call	1	5-15s	~6K
6K-12K	Semantic	2-3	15-25s	~12K
12K-24K	Semantic	4-5	25-40s	~24K
24K+	Hierarchical	6-10	40-90s	30K+

Cost Comparison

Approach	Cost per 100K tokens
OpenAI GPT-4	~$3.00
Anthropic Claude	~$2.40
Local Ollama	$0.00

Usage

Python API

from ml.agents import OllamaDecompositionAgent, AgentConfig

config = AgentConfig(
    ollama_host="192.168.2.2:11434",
    ollama_model="deepseek-r1:32b",
    max_parallel_tasks=3,
    aggregation_strategy="auto"
)

agent = OllamaDecompositionAgent(config)
result = await agent.process(large_prompt)

print(f"Response: {result.final_response}")
print(f"Tokens: {result.total_tokens_used}")
print(f"Duration: {result.total_duration_seconds:.2f}s")

CLI Tool

# Simple prompt
python ml/agents/examples/cli_tool.py "Your prompt here"

# Load from file
python ml/agents/examples/cli_tool.py @large-document.txt

# Custom configuration
python ml/agents/examples/cli_tool.py @prompt.txt \
    --model deepseek-r1:32b \
    --max-tokens 16384 \
    --parallel 4 \
    --aggregation-strategy hierarchical

Configuration

Core Settings

AgentConfig(
    # Ollama
    ollama_host="192.168.2.2:11434",
    ollama_model="deepseek-r1:32b",
    
    # Context Management
    max_context_tokens=8192,
    response_reserve_tokens=2048,
    chunk_overlap_tokens=200,
    
    # Execution
    max_parallel_tasks=3,
    timeout_seconds=300,
    
    # Performance (v2.0+)
    enable_token_count_cache=True,
    enable_async_file_io=True,
    enable_result_caching=False,
)

Results & Benefits

Technical Outcomes

Performance:
├── Latency reduction: 30-50% (with optimizations)
├── Cache hit rate: 40-60%
├── Backward compatibility: 100%
└── API costs: $0

Use Cases

Security Audits: Analyze large codebases in parallel
Document Analysis: Process long reports with coherent synthesis
Code Review: Multi-file reviews with context preservation
Research: Complex multi-part questions with structured responses

Architecture Decisions

Tiktoken over custom counting: OpenAI-standard accuracy, battle-tested
Semantic over fixed-size splitting: Preserves meaning, better coherence
Async over threading: Better I/O performance, cleaner code
LRU caching over persistent: Session-scoped, no stale data issues

January 15, 2026 •

LLM Agent Prompt Engineering Local AI Zero API Costs Parallel Processing