AI systems researcher | Builder

Vilhelm
Toivonen

Distributed LLMs: cognitive core, edge deployment, and tool-using agents.

Doctoral Researcher (distributed LLM inference), University of Helsinki
Consulting AI Architect, Bondata
Founder, Teknet (2019) • Co-founder, Padlo.co (2025)

CURRENT FOCUS 2026

• BridgeLoRA: distributed fine-tuning across edge adapters and cloud backbones (ICDCS 2026; TKDE extension underway)
• On-policy distillation: removing the teacher early without quality loss (COLM 2026, in review)
• Exact-fallback MoE caching and addressable memory banks for frozen models (2× AAAI-27 targets)
• Next: edge-mesh distillation extending the COLM work, and miettijä (teaching small models to reason in Finnish)

Research Agenda Email LinkedIn Scholar

01RESEARCH

2024 – 2026

I focus on distributed LLM inference and small, tool-using models that can live on devices. The goal: a “cognitive core” that reasons well, uses tools, and keeps most knowledge offloaded to retrieval instead of parameters. I got into ML early (high-school research on data augmentation for speech recognition) and I still work empirically: publishing benchmarks, code, and measurements on real consumer hardware (iPhone, MacBook, edge servers).

Fig. 1 · Edge / cloud collaboration

Theses

• Determining User Preference Profiles from Email And User Engagement Data (M.Sc., 2024)
• Lossless Compression of Deep Neural Networks (B.Sc., 2024)

Current Agenda

BridgeLoRA: Journal Extension

Question:What does BridgeLoRA cost and leak in practice across the edge–cloud continuum?

Method:Extending the ICDCS 2026 result with systematic profiling, utility analysis, and honest privacy bounds (≥3311× perplexity blow-up to invert at ε=4)

Output:IEEE TKDE submission

Exact-Fallback Expert Caching for MoE

Question:Does a higher expert-cache hit rate actually mean better output on offloaded MoE models?

Method:It doesn't: quality tracks miss severity, not hit rate. An exact-fallback kernel serves cache misses from CPU memory, so decode stays byte-identical to stock vLLM while a predictor prefetches ahead

Output:+10–19% decode throughput; targeting AAAI-27

Addressable Memory Banks

Question:Can a frozen model serve thousands of users' private facts without rereading them, and without leaking across users?

Method:Per-user rows in an addressable attention bank; strict gradient isolation makes deletion and access control exact, and multi-layer injection scales past a single write

Output:Multi-tenant memory with provable isolation; targeting AAAI-27

View Google Scholar Profile →

02PUBLISHED

peer-reviewed

2026

ICDCS 2026 · accepted

BridgeLoRA: Privacy-preserving Collaborative Skip-Layer Connectors for Efficient Transformer Fine-tuning at the Edge →

Vilhelm Toivonen, Xiang Su, Xiaoli Liu, Sasu Tarkoma, Pan Hui

Skip connectors bridge d > 1 frozen transformer layers, cutting edge-cloud synchronization from O(L) to O(L/d) while every task-specific parameter stays on-device. Outperforms standard LoRA (1.47 vs 1.66 validation loss on Llama-3.2-3B) while training 2.7% of parameters.

The BridgeLoRA interactive paper page: title, authors, and links to video, code, and paper

Interactive paper page · video · figures · code

Paper page →Code →

03PROJECTS

2019 – 2026

2025–2026

Vibemetrics → Bondata acquisition →

CTO → Head of AI → Consulting AI Architect

Led the platform through acquisition (May 2025). Moved from CTO to Head of AI, shipping RAG-based survey agents and recommendations to production. Transitioned to Consulting AI Architect in 2026 to focus on PhD research while staying engaged with the AI roadmap.

Acquired

May 2025

2026

BridgeLoRA: Skip-Layer Connectors at the Edge

Lead Researcher

Privacy-preserving collaborative fine-tuning: adapters target specific transformer layers and stay on-device while frozen backbones run in the cloud. The TKDE extension adds systematic profiling, utility analysis, and honest privacy bounds.

ICDCS 2026

accepted · TKDE extension underway

2025

Stanford CS336 Pretraining Competition

1st Place

Designed and trained a language model achieving the lowest perplexity on the OpenWebText dataset for the CS336 pretraining leaderboard.

1st / 125

teams · pretraining leaderboard

2025

Padlo →

Co-founder

Co-founded padlo.co, a padel live scoreboard + coaching app. Sole coder across mobile, backend, and analytics for player/coach insights.

Launched March 2025 to live tournaments

2025–2026

Where Should LLM Agents Run?

Lead Researcher

Characterizing costs of mobile, edge, and cloud LLM-agent deployments: 5 devices, 7 models, an 8,400-trial trace, and an online-learning (bandit) placement algorithm on top.

MobiHoc 2026 submission with public measurements

2025–2026

Foundation Model Inference at the Edge (Survey)

First Author

~240-reference survey covering serving stacks, hardware, and emerging methods for running language models on-device and at the edge.

Submitted to ACM Computing Surveys, July 2026

2019–

Teknet

Founder, sole operator → Co-owner with brother (2025–)

Continued the company from my grandfather’s legacy. Sole worker for the first ~5 years: sales, manufacturing, packaging, marketing, customer service. Expanded in early 2025 by taking my brother as co-owner.

Profitable services business across two generations

04BACKGROUND

2018 – present

Education

Current

University of Helsinki

Doctoral Researcher

Department of Computer Science

Distributed LLM inference, cognitive core, edge/cloud RL

2023-2024

Aalto University

M.Sc. Computer Science

Machine Learning, Data Science and Artificial Intelligence

LLMs, systems, applied ML

2021-2023

Aalto University

B.Sc. Mathematics and Operations Research

Mathematics, statistical learning, optimization

Completed both B.Sc. and M.Sc. in roughly three years while working in industry roles.

Applied / Embodied Work

Built a go-kart from scratch (moped engine). Practical systems intuition for how parts interact under real constraints.

Ran a small construction company for three summers with a coworker / shareholder: renovations, painting, and small builds; learned end-to-end delivery and hands-on project management of a two-person business.

With the same coworker, sold two products on Amazon US, a deliberate exercise in learning sales, marketing, branding, and end-to-end product building from another angle.

Built an outdoor sauna from scratch: frame, walls, stove, the lot. Same lesson the go-kart taught about real-world constraints, at a different scale.

Competitive Sports

Cross-country Skiing

Level: Competitive (regional)

Club: Pirkkalan Hiihtäjät

Achievements: Multiple regional podium finishes

Orienteering

Level: Club

Club: Kangasala SK

Achievements: Active participant in national competitions

Not the highest national or international level, but the working habits competitive sports demand (tight schedules, knowing limits, pushing through under pressure) translate directly to research and to high-velocity teams.

05FUTURE

2026 – 2027

Final research push

Three threads to close the PhD: a journal extension of BridgeLoRA, an edge-mesh version of on-policy distillation, and a systems paper unifying the work.

Goals

• BridgeLoRA → IEEE TKDE extension: systematic profiling, utility analysis, and honest privacy bounds across the edge–cloud continuum
• Edge-mesh on-policy distillation: students and teachers split across devices, clusters, and even model families
• Systems paper unifying BridgeLoRA, the measurement work, and the scaffold-and-release framing (COLM submission): the PhD thesis spine

Timeline

• 2026: BridgeLoRA accepted at ICDCS; agent-placement paper submitted to MobiHoc; on-policy distillation in review at COLM; survey submitted to ACM Computing Surveys
• Late 2026 – early 2027: edge-mesh distillation manuscript, BridgeLoRA TKDE extension, systems paper draft
• Early 2027: PhD defense and graduation

Long-term Vision

I’m not aiming to spend the next decade on fundamental research alone. The role I want combines the work I’ve already shipped (agentic systems, evaluations, production code) with the cognitive-core research I’m doing now, in a small high-agency team where the system actually reaches users. That research only matters if it’s built ground up: designed, distilled, trained from scratch, and deployed at scale, which takes real compute and a real team. Three years ago I set a ten-year goal of becoming one of the top hundred AI researchers in the world; seven years remain. I don’t know if I’ll get there, but I want to spend those years in a role where both my wins and my failures show up in something millions of people use every day.

2027 · PhD defended; the systems paper unifies the thesis
2028 · the edge-inference line published end to end (survey → ICDCS → TKDE), in a role where the research ships
2030 · leading a group of researchers on technology meant to create value for millions of people
2031 · research I led is in something millions of people use every day
2033 · the ten-year goal falls due

06WRITING

008Teaching a language model to think in Finnish

Jul 3, 2026

Your AI answers you in Finnish, but it thinks in English. I tried to change that, and along the way it turned out you cannot even reward a model into thinking in a language it never tries. Here is the diagnosis, the fix, and the honest failures.

contains:interactive plotquiz· 18 min

Jul 3, 2026

007Can a language model remember you without rereading you?

May 1, 2026

Every request arrives knowing nothing about who's asking, so we stuff facts into the prompt and reread them forever. I spent months of my mech-interp PhD trying to precompute a per-user memory instead: two clean failures, one bank architecture that works, and a layer-injection picture I'd defend hardest.

contains:interactive plot· 10 min

May 1, 2026

006Sparse embeddings: a six-week negative result

Apr 24, 2026

I thought the embedding matrix was the easy part of parameter-golf: quantise it, compress it, ship it. Six weeks later I'd tried per-token bits, sparse training, and STE, and every one of them lost to plain dense INT7. Doubling the vocab buys ~0.01 BPB; buying the bytes back costs more.

contains:interactive plot· 5 min

Apr 24, 2026

005Quantization is a hardware story (and an entropy puzzle)

Apr 17, 2026

I came in thinking quantization was about saving disk space. The bigger story is bandwidth, on both ends of the hardware spectrum, and underneath that there's a small information-theoretic puzzle I didn't expect.

contains:interactive plot· 7 min

Apr 17, 2026

004Why do I keep reaching for state-space models on the edge?

Apr 10, 2026

Every time I push a model down to a phone, the KV cache is what kills me first. SSMs are the cleanest answer I've found, and around mid-2025 SSM hybrids started showing up in frontier models.

contains:interactive plot· 11 min

Apr 10, 2026

003Attention variants beyond softmax: a 2026 map

Apr 3, 2026

Open models quietly stopped putting softmax attention on every layer, and I couldn't find a single chart showing what replaced it. So I drew the map myself: MLA, linear, sparse, and the hybrid stacks three labs landed on independently, all on the same axes.

contains:interactive plotquiz· 7 min

Apr 3, 2026

All writing →

07CONTACT

Email[email protected]

GitHubvimeto

LinkedInVilhelm Toivonen

Twitter/X@ToivonenVilhelm

Google ScholarPublications

UniversityResearch Portal

VilhelmToivonen

CURRENT FOCUS 2026

01/RESEARCH

Recent Papers

Theses

Current Agenda

BridgeLoRA: Journal Extension

Exact-Fallback Expert Caching for MoE

Addressable Memory Banks

02/PUBLISHED

BridgeLoRA: Privacy-preserving Collaborative Skip-Layer Connectors for Efficient Transformer Fine-tuning at the Edge →

03/PROJECTS

Vibemetrics → Bondata acquisition →

BridgeLoRA: Skip-Layer Connectors at the Edge

Stanford CS336 Pretraining Competition

Padlo →

Where Should LLM Agents Run?

Foundation Model Inference at the Edge (Survey)

Teknet

04/BACKGROUND

Education

University of Helsinki

Aalto University

Aalto University

Applied / Embodied Work

Competitive Sports

Cross-country Skiing

Orienteering

05/FUTURE

Final research push

Goals

Timeline

Long-term Vision

06/WRITING

07/CONTACT

Vilhelm
Toivonen

01RESEARCH

02PUBLISHED

03PROJECTS

04BACKGROUND

05FUTURE

06WRITING

07CONTACT