Table of contents

Agents & GenAI: How we boosted a data engineering project at the scale of 150 microservices

For 2.5 years, our team has been delivering a large-scale data engineering project: more than 150 microservices, complex pipelines, and high quality requirements. For the past year, we have integrated GenAI and agents to transform how we work.

In this talk, we share how AI has accelerated certain critical tasks:

Code refactoring: industrializing prototype code 4x faster to meet all our standards. Data contracts: writing this documentation 10x faster, with a precision rivaling human expertise. Architecture tests: automating checks on architecture standards and cloud provider requirements to reduce bugs. But AI is not a magic solution. We will also address:

What still resists it: implementing things we have never done before in the project, complex refactorings. Our best practices: documentation as code & AI-optimized documentation, mono-repository to centralize context, and MCP servers to give the LLM secure and controlled access. With many live demonstrations, this raw feedback includes successes, failures, and concrete lessons for integrating AI into your projects without losing control.

📍 Talk given at:

  • Data Days Lille, March 2026, Lille: slides

Tags:

  • Data Engineering
  • GenAI
  • Agents

MLOps at scale: Platforming the registry and inference to accelerate deployments

AI is now at the heart of all organizations. Data platforms facilitate the creation of high-performing models, but deployment often remains artisanal, requiring the recreation of registries, APIs and runners for each project. Governance at scale of models, required by the AI Act, is tedious.

This forward-looking presentation proposes a concept to standardize and automate these steps in a few clicks or command lines: the model platform.

This talk explores what I believe to be the future of MLOps: model platforms integrating model registries, deployment, A/B testing and shadow production seamlessly. Within 2-3 years, all cloud providers will offer this capability.

A live demonstration of a platform based on open source technologies (MLflow, Kubernetes) will show how a model can be put into production in less than 5 minutes.

After this presentation, you will understand the value of a model platform, identify its main features, and discover a proposed implementation.

📍 Talk given at:

  • Data Days Lille, March 2025, Lille: slides

Tags:

  • MLOps
  • Architecture
  • Prospective

Performance optimization: benefit or sacrifice?

⚡ The faster the code, the better the code.

Rather than adding computing resources and technologies, think about architecture, code and data storage to save hardware resources.

🚀 The first optimizations are best practices that everyone should know; the following ones are sacrificial: they degrade code readability and maintainability. Conceived as a concrete declension of Eroom’s law proposed by Tristan Nitot, this talk starts with an example of poorly written code as we have all done before, then through successive optimizations, we will see the benefits and sacrifices to make to go ever faster.

📍 Talk given at:

  • Touraine Tech, February 2025, Tours
  • Snow Camp, January 2025, Grenoble: slides
  • Breizh Camp, June 2024, Rennes: slides, video

Tags:

  • Data
  • Architecture
  • Sustainable IT

CI/CD in the age of Machine Learning

CI/CD is a well-known software tool for building and deploying artifacts. In Machine Learning, it is a bit particular:

🔢 In addition to building and deploying code, you need to manage the model artifact.

🗓️ Building the model corresponds to its training; it doesn’t only happen when the code changes, it can also be triggered by a change in the data.

🏋️‍♀️ Code typically weighs a few MB, while the model can weigh up to several GB.

These three particularities mean the build and deployment process must be rethought.

📍 Talk given at:

Tags:

  • MLOps
  • Architecture

Draw me a Data Science architecture

An iterative talk during which Sofia and Emmanuel-Lin draw a Data Science architecture following the evolution of business needs.

📍 Talk given at:

Tags:

  • Architecture
  • Data Science
  • MLOps

The story of an emergent architecture

A Data Science model in production on day 1, an emergent architecture, satisfied clients, a serene team.

This is the story told in this talk — that of an emergent architecture project that generated hundreds of thousands of euros from the very first day of development. It is the story of a truly minimalist MVP.

📍 Talk given at:

Tags:

  • Architecture
  • Data Science

MLOps: Production deployment, and then?

Once in production, you need to monitor your model system. Beyond data drift in all directions, how do you choose the right metrics to track in a system that contains many uncertainties?

📍 Talk given at:

Tags:

  • MLOps
  • Data Science
  • Monitoring

Interpretability of Data Science Systems

The need for interpretability in Data Science systems is clearly identified but not always clearly defined.

This talk aims to reframe the why, for whom, for what, and the how of interpretability of these systems.

📍 Talk given at:

Tags:

  • Data Science
  • Interpretability
  • Sustainable IT

Tutorial on the dataPreparation library

Presentation of the open source R library that I have developed and maintained for many years to perform efficient tabular data preparation.

📍 Talk given at:

Tags:

  • Data Engineering
  • R