Conferences

Agents & GenAI: How we boosted a data engineering project at the scale of 150 microservices
MLOps at scale: Platforming the registry and inference to accelerate deployments
Performance optimization: benefit or sacrifice?
CI/CD in the age of Machine Learning
Draw me a Data Science architecture
The story of an emergent architecture
MLOps: Production deployment, and then?
Interpretability of Data Science Systems
Tutorial on the dataPreparation library

Agents & GenAI: How we boosted a data engineering project at the scale of 150 microservices

For 2.5 years, our team has been delivering a large-scale data engineering project: more than 150 microservices, complex pipelines, and high quality requirements. For the past year, we have integrated GenAI and agents to transform how we work.

In this talk, we share how AI has accelerated certain critical tasks:

Code refactoring: industrializing prototype code 4x faster to meet all our standards. Data contracts: writing this documentation 10x faster, with a precision rivaling human expertise. Architecture tests: automating checks on architecture standards and cloud provider requirements to reduce bugs. But AI is not a magic solution. We will also address:

What still resists it: implementing things we have never done before in the project, complex refactorings. Our best practices: documentation as code & AI-optimized documentation, mono-repository to centralize context, and MCP servers to give the LLM secure and controlled access. With many live demonstrations, this raw feedback includes successes, failures, and concrete lessons for integrating AI into your projects without losing control.

📍 Talk given at:

Data Days Lille, March 2026, Lille: slides
Cloud Toulouse, mai 2026, Toulouse : slides

Tags:

Data Engineering
GenAI
Agents

Pragmatic Data Platform: Building a Data Platform Without Skyrocketing Costs

💰 “Why does my data platform cost 10 times more than another customer’s?” 💸 “Our licensing costs have skyrocketed!”

These customer quotes sum up a problem that has become all too common: every organization is building its own data platform, but at what cost? Technology stack complexity and the pursuit of “state-of-the-art” solutions often end up driving up infrastructure and licensing costs, sometimes without real justification.

For the past three years, with a team of six developers, we have built and maintained a data platform that processes several terabytes of data each month, handles over one million events per day, and supports around a hundred use cases—all for less than €5,000 per month in cloud costs.

🚀 How is this possible? This talk is a look back at our architectural choices, our organization, and our pragmatic approach: a data platform that gets the job done, without frills or unnecessary expenses.

📍 Talk given at:

Cloud Toulouse, May 2025, Toulouse: slides

Tags:

Data Engineering
Architecture
Data Platform
Pragmatism

MLOps at scale: Platforming the registry and inference to accelerate deployments

AI is now at the heart of all organizations. Data platforms facilitate the creation of high-performing models, but deployment often remains artisanal, requiring the recreation of registries, APIs and runners for each project. Governance at scale of models, required by the AI Act, is tedious.

This forward-looking presentation proposes a concept to standardize and automate these steps in a few clicks or command lines: the model platform.

This talk explores what I believe to be the future of MLOps: model platforms integrating model registries, deployment, A/B testing and shadow production seamlessly. Within 2-3 years, all cloud providers will offer this capability.

A live demonstration of a platform based on open source technologies (MLflow, Kubernetes) will show how a model can be put into production in less than 5 minutes.

After this presentation, you will understand the value of a model platform, identify its main features, and discover a proposed implementation.

📍 Talk given at:

Data Days Lille, March 2025, Lille: slides

Tags:

MLOps
Architecture
Prospective

Performance optimization: benefit or sacrifice?

⚡ The faster the code, the better the code.

Rather than adding computing resources and technologies, think about architecture, code and data storage to save hardware resources.

🚀 The first optimizations are best practices that everyone should know; the following ones are sacrificial: they degrade code readability and maintainability. Conceived as a concrete declension of Eroom’s law proposed by Tristan Nitot, this talk starts with an example of poorly written code as we have all done before, then through successive optimizations, we will see the benefits and sacrifices to make to go ever faster.

📍 Talk given at:

DevQuest, June 2026, Niort : slides
Touraine Tech, February 2025, Tours
Snow Camp, January 2025, Grenoble: slides
Breizh Camp, June 2024, Rennes: slides, video

Tags:

Data
Architecture
Sustainable IT

CI/CD in the age of Machine Learning

CI/CD is a well-known software tool for building and deploying artifacts. In Machine Learning, it is a bit particular:

🔢 In addition to building and deploying code, you need to manage the model artifact.

🗓️ Building the model corresponds to its training; it doesn’t only happen when the code changes, it can also be triggered by a change in the data.

🏋️‍♀️ Code typically weighs a few MB, while the model can weigh up to several GB.

These three particularities mean the build and deployment process must be rethought.

📍 Talk given at:

Pycon Lithuania, April 2024, Vilnius: slides (in English), video
Meetup Crafting Data Science #11, November 2023, Paris with Sofia Calcagno

Tags:

MLOps
Architecture

Draw me a Data Science architecture

An iterative talk during which Sofia and Emmanuel-Lin draw a Data Science architecture following the evolution of business needs.

📍 Talk given at:

La Duck Conf, March 2022, Paris, with Sofia Calcagno: video
Meetup crafting Data Science #9, November 2022, Paris, with Sofia Calcagno: video

Tags:

Architecture
Data Science
MLOps

The story of an emergent architecture

A Data Science model in production on day 1, an emergent architecture, satisfied clients, a serene team.

This is the story told in this talk — that of an emergent architecture project that generated hundreds of thousands of euros from the very first day of development. It is the story of a truly minimalist MVP.

📍 Talk given at:

La Duck Conf, February 2021, remote: slides, video
Comptoir Octo, the same story from a business perspective, with Marc Frignet: video

Tags:

Architecture
Data Science

MLOps: Production deployment, and then?

Once in production, you need to monitor your model system. Beyond data drift in all directions, how do you choose the right metrics to track in a system that contains many uncertainties?

📍 Talk given at:

La Duck Conf, January 2020, Paris, with Mehdi Houacine: video
Meetup Crafting Data Science, February 2022, Paris, with Mehdi Houacine: slides

Tags:

MLOps
Data Science
Monitoring

Interpretability of Data Science Systems

The need for interpretability in Data Science systems is clearly identified but not always clearly defined.

This talk aims to reframe the why, for whom, for what, and the how of interpretability of these systems.

📍 Talk given at:

Espace éthique d’Île de France, February 2020, Paris: video
Octo’s Ethical by Design morning event, November 2019, Paris: video, press coverage by Christophe Auffray

Tags:

Data Science
Interpretability
Sustainable IT

Tutorial on the dataPreparation library

Presentation of the open source R library that I have developed and maintained for many years to perform efficient tabular data preparation.

📍 Talk given at:

Data Science Conference Europe, November 2021, (remote)
Meetup R addicts Paris, August 2018, Paris: slides

Tags:

Data Engineering
R

Table of contents

Agents & GenAI: How we boosted a data engineering project at the scale of 150 microservices

Pragmatic Data Platform: Building a Data Platform Without Skyrocketing Costs

MLOps at scale: Platforming the registry and inference to accelerate deployments

Performance optimization: benefit or sacrifice?

CI/CD in the age of Machine Learning

Draw me a Data Science architecture

The story of an emergent architecture

MLOps: Production deployment, and then?

Interpretability of Data Science Systems

Tutorial on the dataPreparation library