AI/ML

Agentic AI Governance — Control Plane for AI Agents

Agentic AI Governance — Control Plane for AI Agents

Challenge

Autonomous AI agents are executing tool calls — database queries, API requests, file operations — with minimal human oversight. Enterprises deploying agents face regulatory requirements (EU AI Act Art. 12-14, Singapore MGF) for human oversight, audit trails, and authorization controls. Existing agent frameworks (LangChain, AutoGPT, CrewAI) have no built-in governance layer.

Build a control-plane overlay that intercepts, authorizes, logs, and audits every tool call made by an AI agent — without modifying the agent or tool code.

AI Governance EU AI Act Agent Security OWASP LLM Human-in-the-Loop Audit Trail

EU AI Act Doc Generator — Automated Compliance Artifacts

EU AI Act Doc Generator — Automated Compliance Artifacts

Challenge

The EU AI Act (Regulation 2024/1689) requires AI system providers to produce extensive documentation: risk classification per Annex III, technical documentation per Annex IV (model cards, risk assessments, data governance records, human oversight procedures), and conformity assessment evidence. Manual documentation is time-consuming, inconsistent, and difficult to maintain over the 7-year retention period.

Build a platform that automates the generation of EU AI Act-compliant documentation artifacts.

EU AI Act AI Documentation Model Cards Risk Assessment Conformity Assessment Regulatory Compliance

Jobs Finder — AI-Powered Job Search Pipeline

Jobs Finder — AI-Powered Job Search Pipeline

Challenge

Job searching across multiple platforms is time-consuming. Telegram channels post hundreds of vacancies daily, LinkedIn requires manual browsing, and HH.ru (Russia’s Indeed) needs separate attention. Manually reviewing all postings and assessing fit is inefficient.

Build a pipeline that:

  • Aggregates jobs from Telegram, LinkedIn, and HH.ru
  • Ranks every posting against your CV using AI
  • Runs entirely locally (no cloud API costs, no data leakage)
  • Supports multiple job search profiles (DevSecOps, Data Analyst, PM)

Solution Architecture

Pipeline Flow

┌─────────────────────────────────────────────────────────────┐
│                    Data Sources                              │
├──────────────┬──────────────────┬───────────────────────────┤
│   Telegram   │     LinkedIn     │          HH.ru            │
│ (13 channels)│ (keyword matrix) │    (public REST API)      │
└──────┬───────┴────────┬─────────┴───────────────┬───────────┘
       │                │                          │
       └────────────────┼──────────────────────────┘
                        ▼
              ┌─────────────────────┐
              │    Deduplication    │
              │  (cross-source)     │
              └──────────┬──────────┘
                         │
                         ▼
              ┌─────────────────────┐
              │    AI Ranking       │
              │ Ollama qwen3.5:35b  │
              │ (batch mode: 10/call)│
              └──────────┬──────────┘
                         │
                         ▼
              ┌─────────────────────┐
              │    Filter & Report  │
              │  (threshold: 78+)   │
              └─────────────────────┘
                         │
                         ▼
              outputs/<date>/ranked/report.md

Multi-Profile Support

# profiles/devsecops.yaml
cv_path: "info/CV/devsecops/gantman_2026_compact.md"
linkedin:
  keywords: ["DevSecOps", "Security Architect", "SRE", "CISO"]
  geo: ["Worldwide", "European Union", "Israel"]
telegram:
  channels: ["devops_jobs", "sre_jobs", "security_jobs"]
filters:
  min_salary_usd: 9000
  rank_threshold: 78

Key Features

1. Telegram Channel Scraping

  • Public channel scraping (no bot token needed)
  • 13 default DevOps/Security channels
  • Configurable per profile
  • Rate-limited with polite delays
  • Selenium with persistent Chrome profile
  • Keyword × geo matrix search
  • Handles pagination (up to 40 pages)
  • Extracts: title, company, salary, location, description

3. HH.ru Integration

  • Public REST API (no auth)
  • Area-based search (Russia, Belarus, Georgia, Israel)
  • Experience level filtering
  • Structured vacancy data

4. AI Ranking (Batch Mode)

# Batch processing: 10 jobs per LLM call
prompt = f"""
CV: {cv_markdown}

Jobs: {json.dumps(batch_of_10_jobs)}

Score each job 0-100 for fit. Return JSON:
{{"results": [{{"job_id": "...", "score": 85, "evidence_confidence": 0.9}}]}}
"""

# Amortizes model-load cost across batch
response = await ollama.generate(prompt, model="qwen3.5:35b")

5. State Management

-- ~/.cache/jobs-finder/state.sqlite3
CREATE TABLE seen_jobs (
    job_id TEXT PRIMARY KEY,
    source TEXT,
    first_seen TIMESTAMP,
    score INTEGER,
    evidence_confidence REAL
);
  • Cross-run deduplication
  • Score history tracking
  • Prevents re-ranking known jobs

Configuration

Environment Variables

# Ollama
JOBS_OLLAMA_HOST=http://192.168.2.2:11434
JOBS_OLLAMA_MODEL=qwen3.5:35b
JOBS_OLLAMA_NUM_CTX=16384

# Filters
JOBS_MIN_SALARY_USD=9000
JOBS_RANK_THRESHOLD=78
JOBS_REPORT_TOP_N=10
JOBS_MAX_POSTING_AGE_DAYS=21

# Sources
JOBS_TELEGRAM_MAX_MESSAGES=500
JOBS_LINKEDIN_HEADLESS=false
JOBS_HH_MAX_PAGES_PER_AREA=20

Usage

Run Full Pipeline

# Using profile
python -m jobs_finder pipeline --profile devsecops -v

# Skip specific sources
python -m jobs_finder pipeline --skip-linkedin --skip-hh -v

# Force re-rank everything
python -m jobs_finder pipeline --no-state

Output

outputs/2026-05-15/
├── telegram.jsonl      # Raw scraped posts
├── linkedin.jsonl      # Raw scraped jobs
├── hh.jsonl           # Raw vacancies
└── ranked/
    ├── ranked.jsonl   # All jobs with scores
    └── report.md      # Top-N markdown report

Results & Benefits

Efficiency Gains

Manual Search:
├── Telegram: ~30 min/day reading channels
├── LinkedIn: ~45 min/day browsing + filtering
├── HH.ru: ~20 min/day
└── Total: ~95 min/day

Automated Pipeline:
├── Scraping: ~5 min (background)
├── Ranking: ~10 min (batch AI)
├── Review: ~10 min (top 10 only)
└── Total: ~25 min/day

Savings: 70 min/day, 70%+ reduction

Privacy Benefits

  • No job data sent to cloud APIs
  • CV stays local
  • LinkedIn profile not exposed to third-party services
  • Full control over data retention

Architecture Decisions

  • Ollama batch mode: 10 jobs per call amortizes model-load latency
  • SQLite state DB: Simple, portable, no server needed
  • Chrome persistent profile: Maintains LinkedIn login across runs
  • Multi-profile YAML: Easy to switch between job searches
  • Confidence scoring: Filter out low-evidence matches (< 0.5)

Job Search Web Scraping AI Ranking Local LLM Automation Privacy-First

DevSecOps-5090 — GPU Training Pipeline on Kubernetes

DevSecOps-5090 — GPU Training Pipeline on Kubernetes

Challenge

Fine-tuning Large Language Models typically requires cloud GPU instances (expensive) or complex local setups. Need a production-ready, self-hosted training pipeline that:

  • Runs on local RTX 5090 (32GB VRAM)
  • Deploys via Kubernetes (k3s homelab)
  • Uses pre-built images (no runtime pip installs)
  • Supports QLoRA for memory efficiency

Solution Architecture

Pipeline Overview

┌─────────────────────────────────────────────────────────────┐
│                    K3s Cluster (Homelab)                     │
├─────────────────────────────────────────────────────────────┤
│  ┌─────────────────────────────────────────────────────┐   │
│  │              Training Pod (GPU)                      │   │
│  │  ┌─────────────────┐  ┌─────────────────────────┐   │   │
│  │  │ Init Containers │  │    Main Container       │   │   │
│  │  │ - Verify GPU    │  │ - qwen_qlora_trainer.py │   │   │
│  │  │ - Check deps    │  │ - HuggingFace ecosystem │   │   │
│  │  │ - Mount PVCs    │  │ - bitsandbytes (4-bit)  │   │   │
│  │  └─────────────────┘  └─────────────────────────┘   │   │
│  │                                                      │   │
│  │  ┌─────────────────────────────────────────────┐    │   │
│  │  │              Sidecar                         │    │   │
│  │  │         metrics-exporter                     │    │   │
│  │  └─────────────────────────────────────────────┘    │   │
│  └─────────────────────────────────────────────────────┘   │
│                                                             │
│  Volumes:                                                   │
│  ├── /mnt/models      (RO) - Model cache                   │
│  ├── /mnt/data        (RO) - Training data                 │
│  ├── /mnt/checkpoints (RW) - Output checkpoints            │
│  └── /mnt/training-logs (RW) - Logs + TensorBoard          │
└─────────────────────────────────────────────────────────────┘

Pre-Built Training Image

FROM pytorch/pytorch:2.6.0-cuda12.4-cudnn9-runtime

# Pre-install all dependencies (no runtime downloads)
RUN pip install --no-cache-dir \
    transformers==4.47.0 \
    peft==0.14.0 \
    trl==0.13.0 \
    bitsandbytes==0.45.0 \
    datasets==3.2.0 \
    accelerate==1.2.1 \
    safetensors \
    sentencepiece \
    protobuf

Result: 11.4 GB image, ~2 min boot (vs. 15+ min with runtime pip)

LLM Fine-tuning QLoRA GPU Kubernetes Self-Hosted ML Infrastructure

Universal Knowledge Extractor — LLM Training Data Pipeline

Universal Knowledge Extractor — LLM Training Data Pipeline

Challenge

Fine-tuning LLMs on domain-specific knowledge requires structured, high-quality instruction-response pairs. Manual curation doesn’t scale, and raw content from code repos, docs, and social media needs significant preprocessing before it’s usable for training.

Build a pipeline that:

  • Extracts knowledge from diverse sources (code, docs, Telegram, LinkedIn)
  • Automatically discovers content taxonomy
  • Produces ChatML-formatted JSONL ready for Axolotl/LLaMA-Factory
  • Runs fully offline with local Ollama

Solution Architecture

Pipeline Flow

┌─────────────────────────────────────────────────────────────────┐
│                      Content Sources                             │
├────────────┬────────────┬────────────┬──────────────────────────┤
│ Filesystem │   GitHub   │  Telegram  │        LinkedIn          │
│  (.md, .py)│  (repos)   │ (channels) │       (exports)          │
└─────┬──────┴─────┬──────┴─────┬──────┴────────────┬─────────────┘
      │            │            │                   │
      └────────────┴────────────┴───────────────────┘
                            │
                            ▼
              ┌──────────────────────────┐
              │    Taxonomy Discovery    │
              │  (Unsupervised ML)       │
              └────────────┬─────────────┘
                           │
                           ▼
              ┌──────────────────────────┐
              │   LLM Extraction         │
              │  (Ollama qwen3.5:35b)    │
              │  → Instruction-Response  │
              └────────────┬─────────────┘
                           │
                           ▼
              ┌──────────────────────────┐
              │   Quality Assurance      │
              │  - Schema validation     │
              │  - Token length check    │
              │  - Language detection    │
              │  - Credential scanning   │
              │  - Semantic dedup        │
              └────────────┬─────────────┘
                           │
                           ▼
              ┌──────────────────────────┐
              │   Data Augmentation      │
              │  - Paraphrase            │
              │  - Reformulation         │
              └────────────┬─────────────┘
                           │
                           ▼
         output/training-data.jsonl (ChatML)

Multi-Source Support

SourceAdapterContent Types
FilesystemfsMarkdown, Python, YAML, JSON
GitHubgithubRepos, READMEs, code, issues
TelegramtelegramChannel messages, media captions
LinkedInlinkedinProfile exports, posts, articles

Key Features

1. Automatic Taxonomy Discovery

# No manual topic lists required
taxonomy:
  method: unsupervised
  algorithm: kmeans  # or hdbscan
  num_clusters: auto  # discovers optimal count

Uses embedding-based clustering to automatically categorize content into topics.

LLM Fine-tuning Training Data ChatML Ollama Data Pipeline NLP

Esla — AI-Powered Real Estate Platform

Esla — AI-Powered Real Estate Platform

Challenge

Build a complete real estate platform for the Belarusian market that enables property search via natural language (Russian/Belarusian), provides automated property valuations, connects buyers with verified realtors, and runs entirely as a Telegram Mini-App — all without recurring cloud API costs.

Solution Architecture

Platform Overview

A modular monolith deployed via Docker Compose with 22 services covering the full real estate lifecycle: search, valuation, listings, realtor marketplace, and admin operations.

Real Estate NLP AVM Telegram Mini-App Local LLM Zero API Costs

Ollama Decomposition Agent — Intelligent LLM Orchestration

Ollama Decomposition Agent — Intelligent LLM Orchestration

Challenge

Large Language Models have context window limits and increasing latency with prompt size. When analyzing large codebases, documents, or complex multi-part questions, single-shot prompts either exceed context limits or produce slow, unfocused responses.

Build an agent that:

  • Intelligently splits large prompts into semantic sub-tasks
  • Executes sub-tasks in parallel for speed
  • Synthesizes results into coherent responses
  • Runs entirely on local Ollama (zero API costs)

Solution Architecture

Workflow

User Prompt (large)
    ↓
[Analyze] - Count tokens, identify boundaries
    ↓
[Decide] - Decompose or single call?
    ├─→ Small (<6K tokens) → Single Ollama call → Return
    └─→ Large (>6K tokens) → Decomposition
            ↓
        [Split] - Semantic decomposition (headings, lists, paragraphs)
            ↓
        [Execute] - Parallel sub-task execution (3 concurrent)
            ↓
        [Aggregate] - Result synthesis (auto-strategy selection)
            ↓
        [Return] - Final coherent response

Components

ComponentResponsibility
OllamaDecompositionAgentMain orchestrator, workflow management
TokenManagerTiktoken counting, chunking with overlap
PromptSplitterSemantic decomposition at natural boundaries
OllamaClientAsync HTTP with retry/backoff
ResultAggregatorMulti-strategy synthesis

Key Features

1. Intelligent Prompt Decomposition

# Identifies natural breakpoints
sections = splitter.split(prompt)
# → [heading1_content, list_items, paragraph_block, ...]

# Preserves shared context across sub-tasks
for section in sections:
    subtask = f"{shared_context}\n\n{section}"

2. Parallel Execution with Controlled Concurrency

async with asyncio.Semaphore(3):  # Max 3 concurrent
    results = await asyncio.gather(*subtask_calls)

3. Multi-Strategy Aggregation

Sub-task CountStrategyMethod
1-3ConcatenateJoin all + final synthesis
4-10SequentialSummarize each, then synthesize
10+HierarchicalTree-based summarization

4. Performance Optimizations (v2.0)

OptimizationImprovement
Expert identification toggle25-30% latency reduction
Token count caching40-60% cache hits, 60-180ms savings
Async file I/O~160ms improvement
Parallel token counting140ms+ savings on large prompts
Result caching (LRU)8-12s+ savings on repeated prompts

Performance Characteristics

Execution Times (DeepSeek-R1:32b)

Prompt SizeStrategySub-tasksDurationTokens
< 6KSingle call15-15s~6K
6K-12KSemantic2-315-25s~12K
12K-24KSemantic4-525-40s~24K
24K+Hierarchical6-1040-90s30K+

Cost Comparison

ApproachCost per 100K tokens
OpenAI GPT-4~$3.00
Anthropic Claude~$2.40
Local Ollama$0.00

Usage

Python API

from ml.agents import OllamaDecompositionAgent, AgentConfig

config = AgentConfig(
    ollama_host="192.168.2.2:11434",
    ollama_model="deepseek-r1:32b",
    max_parallel_tasks=3,
    aggregation_strategy="auto"
)

agent = OllamaDecompositionAgent(config)
result = await agent.process(large_prompt)

print(f"Response: {result.final_response}")
print(f"Tokens: {result.total_tokens_used}")
print(f"Duration: {result.total_duration_seconds:.2f}s")

CLI Tool

# Simple prompt
python ml/agents/examples/cli_tool.py "Your prompt here"

# Load from file
python ml/agents/examples/cli_tool.py @large-document.txt

# Custom configuration
python ml/agents/examples/cli_tool.py @prompt.txt \
    --model deepseek-r1:32b \
    --max-tokens 16384 \
    --parallel 4 \
    --aggregation-strategy hierarchical

Configuration

Core Settings

AgentConfig(
    # Ollama
    ollama_host="192.168.2.2:11434",
    ollama_model="deepseek-r1:32b",
    
    # Context Management
    max_context_tokens=8192,
    response_reserve_tokens=2048,
    chunk_overlap_tokens=200,
    
    # Execution
    max_parallel_tasks=3,
    timeout_seconds=300,
    
    # Performance (v2.0+)
    enable_token_count_cache=True,
    enable_async_file_io=True,
    enable_result_caching=False,
)

Results & Benefits

Technical Outcomes

Performance:
├── Latency reduction: 30-50% (with optimizations)
├── Cache hit rate: 40-60%
├── Backward compatibility: 100%
└── API costs: $0

Use Cases

  1. Security Audits: Analyze large codebases in parallel
  2. Document Analysis: Process long reports with coherent synthesis
  3. Code Review: Multi-file reviews with context preservation
  4. Research: Complex multi-part questions with structured responses

Architecture Decisions

  • Tiktoken over custom counting: OpenAI-standard accuracy, battle-tested
  • Semantic over fixed-size splitting: Preserves meaning, better coherence
  • Async over threading: Better I/O performance, cleaner code
  • LRU caching over persistent: Session-scoped, no stale data issues

LLM Agent Prompt Engineering Local AI Zero API Costs Parallel Processing