
Dec 8, 2025
SpineDAO & Pleias are partnering to develop safe AI for wellness and future clinical deployment starting with back pain, the #1 cause of disability worldwide.
SpineDAO, the research collective of 200+ spine clinicians, scientists and engineers is joining forces with Pleias, the AI organisation, to build multi-agent reasoning systems for spine wellness. This collaboration is the first step towards solving the bottleneck of AI's hardest deployment challenge: clinical intelligence that scales without compromising safety for wellness and healthcare.
SpineDAO brings the clinical expertise; Pleias brings the language model architecture and reasoning-first AI infrastructure.
Together we are tackling the scaling crisis in wellness and health AI by studying and researching in a very expert focus specialisation: the spine.
In Back Pain, the #1 cause of disability worldwide the expert clinical judgment changes outcomes dramatically, but there aren't enough spine specialists and there never will be.
Meanwhile, generic LLMs can scale but they're fundamentally unsafe for clinical deployment as they do not embed specific reasoning systems for safe, and efficient, clinical judgment.
The challenge consists into understanding one of the biggest AI bottlenecks of today not as a compute problem nor data quantity problem but as a reasoning architecture problem.
Why This Matters Beyond
This is a demonstration project for reasoning-first AI in high-stakes domains. If we can extract, encode, and deploy expert clinical judgment safely at scale, we've solved a general problem in AI: moving from pattern matching to genuine reasoning under uncertainty with appropriate language.
Small models with structured reasoning > large models with basic statistical correlation. That's the hypothesis we're operationalizing.
The Architecture - Three-Layer Stack
Reasoning Extraction Layer: Work with spine specialists to decompose their actual expert reasoning. When an expert sees "lower back stiffness + 8hr sitting," they're executing a reasoning chain: symptom interpretation → lifestyle factor integration → contraindication screening → progression readiness. We're extracting those process traces, not just input-output pairs.
Knowledge Encoding Layer: Transform tacit clinical knowledge into navigable reasoning graphs. Not brittle rule trees, but probabilistic frameworks that capture clinical nuance and uncertainty.
Agent Training Layer: Train specialized small language models to traverse these clinical reasoning the way practitioners do. Key point: we're not simply doing retrieval-augmented generation here. The models learn to navigate decision spaces using embedded clinical logic structures.
Real clinical cognition isn't monolithic. Experts pattern-match, sequence educational content, provide empathetic support different cognitive modes. So we are building specialized agents for diagnostic reasoning, pedagogical reasoning, conversational support, all sharing a unified clinical knowledge substrate.
Challenge: AI Reasoning Agents for Spine Wellness
Here's where this gets interesting for the broader AI safety conversation. We're establishing benchmark as a new evaluation paradigm that measures reasoning quality, not just predictive accuracy. The benchmark tests: reasoning transparency, uncertainty quantification, safety boundary recognition, contextual adaptation, cognitive soundness. Think of it as "Can this system demonstrate safe expert judgment?" not "Does it predict the right diagnosis?"
This matters because it's a concrete instantiation of the general problem: How do you validate that an AI system reasons correctly in high-stakes domains? Healthcare is the canary in the coal mine for reasoning-first AI.
The Innovation: Reasoning Extraction & Deployment
We don't build AI that learns from clinicians - we build AI that learns how clinicians operate.
Here's the mechanism:
Step 1: Decompose expert reasoning When a spine specialist evaluates someone with lower back stiffness who sits 8 hours daily, they're running a cognitive process: symptom interpretation → lifestyle factor analysis → contraindication check → progression readiness assessment. We work with over 150+ specialists to extract these reasoning chains—not just their recommendations, but the logic connecting observation to action.
Step 2: Encode reasoning as graphs Clinical logic becomes structured pathways? These aren't rigid rules, they're probabilistic reasoning frameworks capturing clinical nuance.
Step 3: Train models to navigate reasoning graphs Specialized AI agents based on pleias SLMs learn to traverse these clinical pathways the way experts do. When processing a new wellness profile, the model doesn't pattern-match to historical cases, it reasons through the decision space using embedded clinical logic.
Step 4: Multi-agent architecture modelised for the clinical cognition. Real specialists don't use one thinking mode, they pattern-recognize profiles, sequence educational content, provide emotional support. Our system follows this: distinct agents specialized for matching (diagnostic reasoning), content curation (pedagogical reasoning), conversational support (empathetic reasoning), all sharing the unified clinical knowledge layer.
The Expert-AI Collaboration Model
How clinicians shape the system:
During development: SpineDAO’s experts have become reasoning architects. Through collecting representative cases: "When I see this, I'm thinking about X possibility, but checking for Y contraindication, which leads me to Z approach.", and establishing highly curated medical and scientific corpus, they have built the trustworthy ground for the initial training data for the AI system.
These reasoning traces are being processed and enriched through the synthetic environment (similar to what we did for SYNTH at pleias) - we're teaching AI the cognitive process, not memorizing answers.
Through edge cases: Clinicians provide boundary scenarios: "This sounds like wellness concern but actually signals X medical issue." The AI learns the judgment framework—the subtle clinical discrimination that separates wellness education from medical overreach.
In continuous refinement: Post-deployment, specialists audit AI reasoning traces. They see: "System recommended Y for this profile because reasoning pathway ABC." When clinical judgment diverges from AI logic, that becomes a refinement signal : we're debugging the reasoning, not just fixing outputs.
Get access to our R&D backroom
CONTACT US


