16 min read
Manuel Herranz will take part in VSCF 2026 with a presentation on sovereign AI, ontologies, small models and the data supply chain behind the generative economy.
Sovereign AI begins long before the data center. It begins with the information an organization controls, the ontologies through which it structures institutional knowledge and the specialized models it can evaluate, adapt and operate under its own technical, legal and economic conditions.
Manuel Herranz, founder and CEO of Pangeanic, will take part in the ValgrAI Scientific Council Forum 2026 on Thursday, July 2, at the main auditorium of the Universitat Politècnica de València.
His presentation, entitled “Sovereign AI: Ontologies, Small Models and Tokens Are the New Coal”, will form part of an international program examining the evolution of artificial intelligence and its application across scientific, industrial and social environments.
The program includes contributions from Virginia Dignum, Hiroaki Kitano, Tom Dietterich, José María Azorín and Ramón López de Mántaras, among other international specialists. Pangeanic is also supporting the forum, contributing a perspective built through language technology, data for AI, model alignment and the deployment of AI systems in enterprise and institutional environments.
A trajectory connecting models, data and knowledge
The 2026 presentation continues a line of work that Pangeanic began in language technology and has progressively extended toward specialized, governable AI systems adapted to the knowledge of individual organizations.
Small models for specific applications
Manuel Herranz examined the rise of smaller specialized models as a suitable architecture for clearly defined enterprise tasks, with lower computational requirements and greater operational control.
Research and technology transfer
Collaboration with ValgrAI and Universitat Jaume I transferred this vision into applied research and helped connect academic knowledge with enterprise use.
Data, ontologies and operational sovereignty
The new presentation expands the discussion toward data control, structured knowledge and the ability to operate AI systems under an organization’s own criteria.
Small models for specific enterprise tasks
During the 2025 edition of the forum, Manuel Herranz discussed the growing role of small models as an architecture particularly well suited to specific enterprise applications.
Small language models and other specialized models can offer significant advantages when the problem is clearly defined: lower computational consumption, faster response times, easier adaptation and more controllable deployment within private infrastructure.
Their usefulness increases when they are trained, fine tuned or connected to representative domain data and evaluated against precise operational criteria. A model designed to interpret legal documentation, classify files, translate technical content or assist an industrial process needs its own terminology, documents, instructions, examples, constraints and quality standards.
Pangeanic develops this layer through AI Data Operations, covering data preparation, annotation, human review, preference data, evaluation, alignment, governance and the continuous improvement of AI systems.
Ontologies as the architecture of institutional knowledge
Organizations accumulate large volumes of documents, databases, manuals, case files, terminology and tacit knowledge. Without a structure connecting these elements, information remains fragmented and difficult for an intelligent system to use coherently.
Ontologies provide that conceptual map. They define the entities relevant to an organization, the relationships between them, their categories, properties, hierarchies, permissions and interpretation rules.
Concepts and terminology
Precise representation of legal, technical, administrative and sector specific terminology.
Relationships and hierarchies
Connections between entities, documents, procedures, responsibilities and levels of authority.
Permissions and context
Rules governing which information can be used, who can access it and under what circumstances.
When this structure is combined with knowledge retrieval, specialized models and properly prepared data, AI can operate on a much more faithful representation of institutional reality. Sovereignty then acquires an intellectual dimension because the organization retains control over how its own knowledge is represented, classified and used.
“Tokens Are the New Coal”: The Fuel of the Generative Economy
The expression “tokens are the new coal” presents tokens as the consumable unit of the generative economy. The comparison is deliberately provocative: both fuel productive infrastructure, depend on a supply chain and concentrate value around those who control extraction, transformation and distribution.
Behind every useful token lies prior material made up of documents, conversations, images, recordings, annotations, human decisions and expert knowledge.
Data must be collected or licensed, cleaned, normalized, classified, annotated, anonymized, evaluated and connected to a specific task. Volume remains useful, although provenance, representativeness and quality largely determine the model’s subsequent behavior.
This supply chain places datasets for AI and data operations at the center of technological autonomy.
Preparation and transformation
Collection, licensing, cleaning, normalization, structuring and preparation of data for models and AI systems.
Explore data for AI →Training and evaluation assets
Multilingual data across text, speech, audio, image, video, OCR, evaluation and domain specific knowledge.
Explore datasets →Evaluation, alignment and improvement
Annotation, human oversight, preference data, evaluation, alignment and improvement cycles for production AI systems.
Explore AI Data Operations →From multilingual data to the alignment of European AI models
Pangeanic began its journey by collecting, aligning and processing data for machine translation systems. That experience produced large linguistic repositories and an industrial capacity to work with multilingual data at scale.
Over time, this work expanded into datasets covering text, speech, audio, image, video, documents, instructions, evaluations and human preference signals. The technical continuity remains clear: turning dispersed information into reliable material for training, adapting, evaluating and governing intelligent systems.
Collaboration with Barcelona Supercomputing Center
Pangeanic has collaborated with Barcelona Supercomputing Center on AI data, annotation, evaluation, RLHF and the alignment of European language models including Salamandra and ALIA.
The collaboration illustrates the shift from corpus collection toward a more demanding task: defining how a model should behave across different languages, domains and situations.
View the BSC use caseMultilingual model alignment
Alignment requires instructions, examples, human preferences, evaluations, error taxonomies and expert review.
It also requires proper representation of languages and cultural contexts that remain underrepresented in large general purpose datasets.
Explore model alignmentFrom SAVIA to enterprise deployment
The SAVIA project connects research, linguistic knowledge and technology transfer, showing how collaboration between universities and companies can move scientific advances toward practical tools and processes.
VSCF 2025: the future of small models
Manuel Herranz’s previous presentation anticipated the growing importance of smaller specialized models for enterprise applications.
SAVIA: research and technology transfer
Collaboration with ValgrAI and Universitat Jaume I transferred the discussion of models and knowledge into applied research.
Sovereignty as operational capacity
The physical location of servers forms part of sovereignty, although the concept extends across a much broader operational chain.
An organization gains greater autonomy when it can decide which data it uses, which knowledge it incorporates, which models it selects, how results are evaluated and under what conditions each component is deployed.
Capabilities supporting sovereign AI
- Control data provenance and rights of use.
- Decide which knowledge enters the system.
- Select the right model for each task.
- Apply internal evaluation and alignment criteria.
- Protect personal, sensitive and confidential information.
- Deploy components within private infrastructure.
- Supervise behavior through human oversight.
- Replace models or providers without losing institutional knowledge.
Pangeanic brings these capabilities together within its sovereign AI systems offering, combining domain specific data, specialized models, language technology, evaluation and controlled deployment options.
The ECO Intelligence Platform adds an orchestration layer for translation, knowledge retrieval, anonymization, quality evaluation, APIs and human review within governed enterprise workflows.
A European conversation about control and specialization
Manuel Herranz will bring a clear proposition to the forum: the next stage of enterprise AI will depend less on accumulating generative capacity and more on organizing knowledge, preparing data and deploying models suited to specific tasks.
Small models provide efficiency and specialization. Ontologies provide structure. Data determines what the system can learn. Evaluation and human oversight make its behavior governable.
The combination of these layers provides a more mature basis for technological sovereignty, particularly for European enterprises, public administrations and regulated sectors.
Build an AI strategy around your own data, models and institutional knowledge
Pangeanic helps enterprises, institutions and public administrations prepare data, adapt specialized models, structure knowledge, evaluate systems and deploy multilingual AI workflows under conditions of greater control.

