Conferences
Table of contents
- Agents & GenAI: How we boosted a data engineering project at the scale of 150 microservices
- MLOps at scale: Platforming the registry and inference to accelerate deployments
- Performance optimization: benefit or sacrifice?
- CI/CD in the age of Machine Learning
- Draw me a Data Science architecture
- The story of an emergent architecture
- MLOps: Production deployment, and then?
- Interpretability of Data Science Systems
- Tutorial on the dataPreparation library
Agents & GenAI: How we boosted a data engineering project at the scale of 150 microservices
For 2.5 years, our team has been delivering a large-scale data engineering project: more than 150 microservices, complex pipelines, and high quality requirements. For the past year, we have integrated GenAI and agents to transform how we work.
In this talk, we share how AI has accelerated certain critical tasks:
Code refactoring: industrializing prototype code 4x faster to meet all our standards. Data contracts: writing this documentation 10x faster, with a precision rivaling human expertise. Architecture tests: automating checks on architecture standards and cloud provider requirements to reduce bugs. But AI is not a magic solution. We will also address:
What still resists it: implementing things we have never done before in the project, complex refactorings. Our best practices: documentation as code & AI-optimized documentation, mono-repository to centralize context, and MCP servers to give the LLM secure and controlled access. With many live demonstrations, this raw feedback includes successes, failures, and concrete lessons for integrating AI into your projects without losing control.
📍 Talk given at:
- Data Days Lille, March 2026, Lille: slides
Tags:
- Data Engineering
- GenAI
- Agents
MLOps at scale: Platforming the registry and inference to accelerate deployments
AI is now at the heart of all organizations. Data platforms facilitate the creation of high-performing models, but deployment often remains artisanal, requiring the recreation of registries, APIs and runners for each project. Governance at scale of models, required by the AI Act, is tedious.
This forward-looking presentation proposes a concept to standardize and automate these steps in a few clicks or command lines: the model platform.
This talk explores what I believe to be the future of MLOps: model platforms integrating model registries, deployment, A/B testing and shadow production seamlessly. Within 2-3 years, all cloud providers will offer this capability.
A live demonstration of a platform based on open source technologies (MLflow, Kubernetes) will show how a model can be put into production in less than 5 minutes.
After this presentation, you will understand the value of a model platform, identify its main features, and discover a proposed implementation.
📍 Talk given at:
- Data Days Lille, March 2025, Lille: slides
Tags:
- MLOps
- Architecture
- Prospective
Performance optimization: benefit or sacrifice?
⚡ The faster the code, the better the code.
Rather than adding computing resources and technologies, think about architecture, code and data storage to save hardware resources.
🚀 The first optimizations are best practices that everyone should know; the following ones are sacrificial: they degrade code readability and maintainability. Conceived as a concrete declension of Eroom’s law proposed by Tristan Nitot, this talk starts with an example of poorly written code as we have all done before, then through successive optimizations, we will see the benefits and sacrifices to make to go ever faster.
📍 Talk given at:
- Touraine Tech, February 2025, Tours
- Snow Camp, January 2025, Grenoble: slides
- Breizh Camp, June 2024, Rennes: slides, video
Tags:
- Data
- Architecture
- Sustainable IT
CI/CD in the age of Machine Learning
CI/CD is a well-known software tool for building and deploying artifacts. In Machine Learning, it is a bit particular:
🔢 In addition to building and deploying code, you need to manage the model artifact.
🗓️ Building the model corresponds to its training; it doesn’t only happen when the code changes, it can also be triggered by a change in the data.
🏋️♀️ Code typically weighs a few MB, while the model can weigh up to several GB.
These three particularities mean the build and deployment process must be rethought.
📍 Talk given at:
- Pycon Lithuania, April 2024, Vilnius: slides (in English), video
- Meetup Crafting Data Science #11, November 2023, Paris with Sofia Calcagno
Tags:
- MLOps
- Architecture
Draw me a Data Science architecture
An iterative talk during which Sofia and Emmanuel-Lin draw a Data Science architecture following the evolution of business needs.
📍 Talk given at:
- La Duck Conf, March 2022, Paris, with Sofia Calcagno: video
- Meetup crafting Data Science #9, November 2022, Paris, with Sofia Calcagno: video
Tags:
- Architecture
- Data Science
- MLOps
The story of an emergent architecture
A Data Science model in production on day 1, an emergent architecture, satisfied clients, a serene team.
This is the story told in this talk — that of an emergent architecture project that generated hundreds of thousands of euros from the very first day of development. It is the story of a truly minimalist MVP.
📍 Talk given at:
- La Duck Conf, February 2021, remote: slides, video
- Comptoir Octo, the same story from a business perspective, with Marc Frignet: video
Tags:
- Architecture
- Data Science
MLOps: Production deployment, and then?
Once in production, you need to monitor your model system. Beyond data drift in all directions, how do you choose the right metrics to track in a system that contains many uncertainties?
📍 Talk given at:
- La Duck Conf, January 2020, Paris, with Mehdi Houacine: video
- Meetup Crafting Data Science, February 2022, Paris, with Mehdi Houacine: slides
Tags:
- MLOps
- Data Science
- Monitoring
Interpretability of Data Science Systems
The need for interpretability in Data Science systems is clearly identified but not always clearly defined.
This talk aims to reframe the why, for whom, for what, and the how of interpretability of these systems.
📍 Talk given at:
- Espace éthique d’Île de France, February 2020, Paris: video
- Octo’s Ethical by Design morning event, November 2019, Paris: video, press coverage by Christophe Auffray
Tags:
- Data Science
- Interpretability
- Sustainable IT
Tutorial on the dataPreparation library
Presentation of the open source R library that I have developed and maintained for many years to perform efficient tabular data preparation.
📍 Talk given at:
- Data Science Conference Europe, November 2021, (remote)
- Meetup R addicts Paris, August 2018, Paris: slides
Tags:
- Data Engineering
- R