SCALING DOCUMENT

SCALING DOCUMENT

SCALING DOCUMENT

PROCESSING WITH OPEN

PROCESSING WITH OPEN

PROCESSING WITH OPEN

SCIENCE LLMS

SCIENCE LLMS

SCIENCE LLMS

WE BUILD ENERGY-EFFICIENT LLMS FOR INFORMATION-INTENSIVE AND HIGHLY REGULATED INDUSTRIES

WE BUILD ENERGY-EFFICIENT LLMS FOR INFORMATION-INTENSIVE AND HIGHLY REGULATED INDUSTRIES

WE BUILD ENERGY-EFFICIENT LLMS FOR INFORMATION-INTENSIVE AND HIGHLY REGULATED INDUSTRIES

AI ACT-COMPLIANT OPEN SCIENCE MODELS

AI ACT-COMPLIANT OPEN SCIENCE MODELS

TRAINING DATA ASSETS DESIGNED TO PREEMPT INDUSTRIES REAL-LIFE USE CASES

TRAINING DATA ASSETS DESIGNED TO PREEMPT INDUSTRIES REAL-LIFE USE CASES

ENHANCED MULTILINGUAL TOKENISER BUILT FOR LANGUAGE-PARITY

ENHANCED MULTILINGUAL TOKENISER BUILT FOR LANGUAGE-PARITY

HYBRID SMALL LANGUAGE MODELS FOR HANDLING DOCUMENT PROCESSING AT SCALE

HYBRID SMALL LANGUAGE MODELS FOR HANDLING DOCUMENT PROCESSING AT SCALE

WE CARE ABOUT DATA

WE CARE ABOUT DATA

(01)

We develop unique multilingual synthetic data capacity

Through novel approaches of LLM-driven rephrasing, refining and redocumentarising of the original content, massive high-quality synthetic datasets will be established and routinely expanded for customer use cases.

(02)

We build and open corpus mining pipelines

Numerous untapped training data sources exist beyond the typical web archives and copyrighted material. We develop innovative pipelines for corpus preparation, along with models capable of recognizing various layouts, allowing for the integration of overlooked open data, open science and cultural heritage resources, particularly those in PDF format.

(03)

We integrate and support semantic data

We build an extensive collection of semantic web for pretraining and alignment with a large diversity of standards matching use cases: XML, XLBR, RDF.

A

A

KNOWLEDGE-GROUNDED,

KNOWLEDGE-GROUNDED,

FORMAT-SENSITIVE

FORMAT-SENSITIVE

LLM

LLM

OPENS

OPENS

THE

THE

PATH

PATH

FOR

FOR

DEVELOPING

DEVELOPING

ROBUST

ROBUST

AND

AND

TRUSTWORTHY

TRUSTWORTHY

APPLICATIONS,

APPLICATIONS,

OPENING

OPENING

THE

THE

ERA

ERA

OF

OF

AI-DRIVEN

AI-DRIVEN

DOCUMENT

DOCUMENT

PROCESSING

PROCESSING

IN

IN

SENSITIVE

SENSITIVE

USE

USE

CASES.

CASES.

FOUNDING

TEAM

PIERRE-CARL-LANGLAIS

/01

Associate Researcher at Sorbonne Center for Artificial Intelligence and Sciences Po Médialab

/02

Previously opsci.ai - developed pioneering LLM assistant for French Public Services (Albert) and for Ministry of Education (Cassandre

/03

Co-author of OA Diamond Study

ANASTASIA STASENKO

/01

Associate Researcher at Sorbonne Center for Artificial Intelligence and Sciences Po Médialab

/02

Previously opsci.ai - developed pioneering LLM assistant for French Public Services (Albert) and for Ministry of Education (Cassandre

/03

Co-author of OA Diamond Study

/04

Former Hachette Livre - digital learning product manager - editorial heritage transformation into digital products

IVAN YAMSCHIKOV

/01

Associate Researcher at Sorbonne Center for Artificial Intelligence and Sciences Po Médialab

/02

Previously opsci.ai - developed pioneering LLM assistant for French Public Services (Albert) and for Ministry of Education (Cassandre

/03

Co-author of OA Diamond Study

OUR CUSTOMERS

OUR RESEARCH PARTNERS

Member of Scaleway Startup Growth Program

Member of Scaleway Startup Growth Program

CONTACT@PLEIAS.FR

CONTACT@PLEIAS.FR

CONTACT US