2024 Talks
2
Changing Large Tables Keynote
Everything changes and nothing stays the same. Yet, when it comes to managing a Dataset, we often consider change as a secondary thought. However, the world is evolving rapidly, and the Dataset must keep pace to remain useful. Rows must be inserted, deleted, or updated. In a data management environment, managing change is therefore not optional. However, doing it well is difficult. It is all too common to see sparse collections of CSV and Parquet files that are somehow derived from one another. We can do better.
Recent advances, such as Lakehouse-type formats and various schema management initiatives, aim to improve this state of affairs, but the exact direction of this evolution remains uncertain. In my presentation, I will discuss the advantages and challenges of integrating traditional transactional semantics into large-scale data analysis workflows. We will see data and schema changes in action, and even real time travel.
3
Mixed Model Arts - The Convergence of Data Modeling Across Apps, Analytics, and AI
For decades, data modeling has been fragmented by use cases: applications, analytics, and ML/AI. With the emergence of AI, continuous data flows, and the anticipation of data modeling, these siloed approaches creating compartments are no longer sufficient in today's diverse world of data use cases.
Today's practitioners must master end-to-end the numerous data modeling techniques throughout their lifecycle.
This presentation focuses on 'Mixed Model Arts', which advocates for converging different data modeling methods and innovating with new approaches.
4
(Gen)AI at the Heart of Mirakl's Product: From Inception to Deployment of the Mirakl Catalog Transformer
Mirakl was founded in 2012 with the idea of enabling any retailer to offer the best deals to their customers by developing a marketplace business. Since 2020, the Data Science team has introduced AI features to improve user experience by enriching product functionalities with AI (automatic recategorization, price anomaly detection, etc.), and in 2023, after the advent of ChatGPT, with product description enrichment.
With the arrival of GenAI, we saw the opportunity to rethink our product engineering by rebuilding them with AI at the core. This led us to shift from a Data Science project mode to an AI product approach with AI teams.
In this presentation, we will break down the steps that allowed us to go from three features in the product to a 'One Click Transformer' in just six months:
1) The product discovery/research phase,
2) How to achieve a beta version with a generative AI feature in the hands of our customers,
3) Iteration and continuous improvement based on user problems,
4) The scaling phase with layering and tuning to improve quality while controlling costs at scale. As well as extending to an AI product approach for the entire team.
Pause
Recharge your batteries, grab a coffee, exchange some ideas and, if curiosity strikes you, let yourself be tempted by a visit to the conference stands.
Pause
Recharge your batteries, grab a coffee, exchange some ideas and, if curiosity strikes, let yourself be tempted by a visit to the conference booths.
How does the Gard department combine Modern Data Stack and Geomatics?
The Gard Department is a territorial authority that employs nearly 3,000 agents. Every day, they serve the general interest in various fields: health and social action, departmental roads, middle schools, very high-speed broadband, ...
Through this action, Gard consumes and produces large quantities of data daily that the Department, through its Innovation and Information Systems Directorate, wishes to enhance. With this in mind, a study was conducted among territorial peers and more broadly within the data ecosystem to attempt to identify the state of the art and define a roadmap.
This study showed that the Modern Data Stack is widely spreading both in large groups and in smaller companies. Dedicated sites, podcasts, communities and conferences praise its merits. However, while this approach seems to be unanimous, few implementations are found in local authorities: would the Modern Data Stack not be adapted to the challenges of territorial civil service?
A local authority is first and foremost a territory. As such, the data we handle is largely geolocated:
are there accident-prone areas that would require changes to the road layout? where are the populations that need help the most and how should we distribute our staff across the territory? what is the most relevant location to build the future middle school considering the location of students? Therefore, the stack must be geographical! With this constraint established, which tools should be selected to build our Modern Data Stack?
This talk will present the stack implemented at the Gard Department and the reasons that led to making these choices. We will see in particular that while some reference building blocks, including DBT, perform well with geographical data, others struggle to keep up, giving way to tools less known in data science but widely used by geomaticians. We will also see some use cases that will illustrate the contribution of geography for data analysis and decision support. Finally, we will conclude on the evolution perspectives of our stack and our organization.
How to build an impactful Data & AI vision and strategy?
Reservation-only workshop to share advice and concrete examples of Data & AI vision and strategy from different company sizes and contexts. The goal is to provide examples of documented strategies and visions, but also (and above all) the process for creating a compelling and inspiring Data vision and strategy.
5
Pause
Recharge your batteries, grab a coffee, exchange some ideas and, if curiosity strikes you, let yourself be tempted by a visit to the conference booths.
6
Building a Robust Data Platform on the Road to Self-Service Analytics
At Malt, our central Data team is at the heart of stakeholder data requests, often becoming a bottleneck. Enabling self-service is crucial to eliminate this bottleneck and to scale by providing the right tools to the right people for the right use cases.
To achieve this, we have built solid and clean foundations in our data warehouse, organized in layers with a clear exposure layer. This requires collaboration between data engineers and analytics engineers.
We have approached self-service by taking into account different profiles and needs:
- A self-service layer for business users in Looker via a generative AI application.
- Tools for the Data team to unlock ad hoc analyses and sharing.
- A self-service layer for data users via a dedicated AI assistant to help navigate the Data warehouse.
The goal is to share our journey and our challenges. We will detail how we approached the challenges from an organizational perspective and the tools we implemented. This should provide keys and ideas for anyone heading towards self-service.
How to create a GDPR-compliant Iceberg Lakehouse
Before open table formats became popular, European tech companies had two options to implement GDPR compliance on their data: move all data to the warehouse to delete some of it if necessary, or implement a data retention policy in the data lake using lifecycle to delete all data after a certain period.
In this presentation, I will introduce the Lakehouse architecture and explain how to design and implement an Iceberg Lakehouse for your data to reduce costs and increase throughput while maintaining GDPR compliance
7
The Intrinsic Limitations of Large Language Models: Understanding Hallucinations and Their Impact on Data Workflows
Large Language Models (LLMs) have revolutionized natural language processing and opened new perspectives in data applications. However, they are not without limitations.
This presentation will explore the main constraints of LLMs, focusing on the phenomenon of hallucinations—cases where models generate incorrect or absurd information. Contrary to common perception, these hallucinations are not simple bugs, but an inherent characteristic of how LLMs are designed and trained: in other words, hallucinations will never disappear from LLMs, even in 10 years. Moreover, hallucinations are, by design of LLMs, very convincing and sometimes difficult to detect! We will explore the underlying reasons for these limitations, rooted in the probabilistic and auto-regressive nature of LLMs.
Understanding why hallucinations occur is crucial to recognizing that they cannot be completely eliminated. They must rather be managed effectively, particularly when integrating LLMs into data pipelines. The presentation will address the concrete implications of LLM limitations for Data engineers, Data analysts, and business users.
We will examine scenarios where hallucinations can lead to data misinterpretation, flawed analysis, and erroneous business decisions.
Furthermore, practical strategies to mitigate the impact of these limitations will be discussed, including model fine-tuning, integration of human-in-the-loop approaches, and the use of complementary technologies to improve reliability.
Team Meetings #1
Team Meetings #1
8
Mom, I messed up the production deployment of my AI algorithm!
It's a well-known statistic in the field: the vast majority of AI projects fail. But the most discouraging failures, with the strongest business impact, are those that occur once the model is in production: a bad recommendation model breaks user trust in the algorithm, a bad facial recognition model prevents unlocking one's phone, a bad pedestrian detection model can cause a fatal accident...
Over the past 4 years, I have deployed several AI algorithms in production across different contexts: each presented difficulties and learnings that contributed to shaping my convictions on how to properly deploy AI algorithms (MLOps). I share the most interesting ones with you in this talk:
1) On the importance of load testing
2) Watch out, my data is drifting!
3) How to know if my model works in real life?
4) Let's (re)discover together through these experiences the fundamentals of ML monitoring!
9
Why LLMs can't do analytics ?
In this talk, we will see why language models (LLMs) are not really designed for data analysis. Even though they promise quick answers through AI, they often lack precision and reliability for making good decisions. We will discuss the limitations of LLMs for conducting comprehensive analyses, and we will show you a better option: using them to search and exploit existing analyses. You will see why relying on reliable data is the best way to obtain useful insights.
10
Domain-Driven Design: A Revolutionary Approach for Data Engineering
In a world where data is at the heart of strategic decisions, it is crucial to ensure that data models faithfully reflect the realities of the business domain. Domain-Driven Design (DDD) offers an innovative approach to solve common problems such as data silos, misunderstandings between technical and business teams, and the growing complexity of systems.
In this 5-minute talk, we will explore how DDD can transform data engineering by improving data quality, system maintainability, and process efficiency. Through concrete examples, we will demonstrate the tangible benefits of this approach and offer keys for its adoption. Join us to discover how DDD can revolutionize your data projects and maximize their impact.
11
Unlock new SQL capabilities with BigFunctions !
Explore a framework that allows you to create and use over 100 powerful BigQuery functions to supercharge your data analysis. Learn to collect data effortlessly, perform advanced transformations, and activate them without leaving SQL.
12
Lunch
Enjoy a friendly and delicious lunch break to recharge your batteries. The meal will be prepared by Meet My Mama, a committed caterer highlighting world cuisines through the talent of "Mamas", entrepreneurial chefs who share their culinary passion while promoting a more inclusive and sustainable society. Vegetarian and vegan options will be offered, delighting both your taste buds and the environment.
This break is also a perfect opportunity to connect with other participants, expand your network, or visit the organizers' and sponsors' booths.
Lunch
Enjoy a friendly and delicious lunch break to recharge your batteries. The meal will be prepared by Meet My Mama, a committed caterer highlighting world cuisines through the talent of "Mamas", entrepreneurial chefs who share their culinary passion while promoting a more inclusive and sustainable society. Vegetarian and vegan options will be offered, delighting both your taste buds and the environment.
This break is also a perfect opportunity to connect with other participants, expand your network, or visit the organizers' and sponsors' booths.
Lunch
Enjoy a friendly and delicious lunch break to recharge your batteries. The meal will be prepared by Meet My Mama, a committed caterer highlighting world cuisines through the talent of "Mamas", entrepreneurial chefs who share their culinary passion while promoting a more inclusive and sustainable society. Vegetarian and vegan options will be offered, delighting both your taste buds and the environment.
This break is also a perfect opportunity to connect with other participants, expand your network, or visit the organizers' and sponsors' booths.
13
Product Analytics : making sense from unreliable data
In today's constantly evolving digital environment, product analytics is essential for optimizing user experience and driving revenue.
However, data is often accompanied by reliability issues, such as inconsistencies, gaps, and noise.
This presentation will detail the challenges faced by product analytics and propose several strategies for extracting valuable insights, despite data quality issues.
Team Meetings #2
Team Meetings #2
14
15
Analytics for All: Creating a Self-Service Culture in a Scaling Organization
Brevo's Data team was overwhelmed with insight requests, which often slowed down decision-making processes. By leveraging a Modern Data Stack, the company transitioned to a self-service model that enables every employee, regardless of their technical expertise, to explore and analyze data autonomously. Taha will address the key strategies implemented to facilitate this transformation, including AI integration, which simplifies data interpretation and improves the user experience.
The different synchronization modes of Data platforms at Carrefour
The largest companies, composed of multiple countries and/or subsidiaries, have several local Data Platforms. At Carrefour, we have a federated model where each country has its own platform
As long as Data Platforms are used locally, there is no problem. However, when headquarters wants to consolidate and aggregate data from different Data Platforms to create global applications, analytics dashboards or centralized operations, many problems need to be solved. Beyond data governance and documentation, data synchronization is a real challenge.
When should data be retrieved from each Data Platform to cross-reference and aggregate it?
Many solutions are possible: using a scheduler, using an orchestrator, using an event-driven architecture... We will review these solutions, their advantages and disadvantages depending on use cases and requirements.
A demo and interactions with participants are planned for this session
16
A lightweight stack to track Analytics standardization at scale: dbt + duckdb + Observable Framework
We transitioned at Decathlon from a centralized Analytics organization to a decentralized business domain organization this year. However, we had the ambition last year to reduce the number of DBT repositories on our Github from 250 to 15 repositories (1 per domain), in order to enable simpler governance and to allow DBT architectures to converge.
To continue our standardization work at scale and monitor the evolution of these 15 repositories, we chose to apply DORA metrics to each of the repositories. We developed an analytics stack that is entirely embedded in Github.
We use:
- duckdb as the execution engine, which recently supports table reading
- Deltadbt-duckdb to orchestrate modeling
- Observable Framework to build a dataviz application, deployed statically on Github Pages
When Product culture meets Generative AI
AI has been serving product development to meet user needs for several years now. With the rise of Generative AI, the emergence of these new models could have been completely transparent and seamless in the Product approach. Many thought that GenAI was simply a "new tool" in the technological revolution that is AI. Well no, it's not that simple and immediate! We must acknowledge that GenAI is a revolution in itself, which disrupts our achievements, but also our convictions, our habits and our processes.
How does the Product Management world adapt to this disruption to continue providing products that are always useful, usable and used? The SNCF Connect & Tech teams will tell you in detail about the launch of their first use case exploiting the capabilities of Generative AI.
Beyond the technical implementation of our RAG for Customer Relations teams, we will tell you about the path we chose to take to build a tailor-made tool. Our ambition: to combine the best of available technologies with our human expertise, while designing a relevant, fast and integrated user experience.
In this talk, we will see together what doesn't change, what had to change, and what will need to change to reach production serenely.
17
The future of the CDO and Data & AI teams
Join us for an exclusive roundtable on the evolution of Chief Data Officer (CDO) roles and Data & AI teams, hosted by Robin Conquet, creator of DataGen. Virginie Cornu (MyFenix, ex-Jellysmack) and Claire Lebarz (Malt, ex-AirBnB) will share their perspectives on the future of strategic roles in product, technology and data domains. We will explore the evolution of the CDO role, its impact on the CPO role, key skills for future Data & AI leaders, as well as the new organizational dynamics of Data teams within mature companies.
18
And after SQL?
What comes after SQL? This may seem like a bold statement. But is a language created over 30 years ago still relevant for our analytical needs?
SQL was designed for OLTP (Online Transaction Processing), for CRUD operations (Create, Read, Update, Delete).
In the era of data analysis, we now use SQL to transform data, create ad hoc analyses, and develop business intelligence dashboards.
We have created tools (like dbt) to streamline this process and introduce 'software best practices'. We have made SQL our lingua franca for everything related to analysis.
SQL doesn't need to change. It has been working very well for decades. It is the cornerstone of most of our modern databases. It's the data and what we do with it that has changed. Yet, we still rely on fairly basic frameworks (Spark with Hadoop/MapReduce) and have built our analytical semantics on top of SQL to handle data that is no longer rectangular.
Are we missing something? What comes next after SQL?
Drawing from my experiences as a Data Ops engineer, supporting data teams in companies such as Deezer, Olympique de Marseille, Maisons du Monde, etc., this presentation will examine the overlooked flaw introduced by SQL in the world of analysis, how it can be managed, and how new frameworks are paving the way in this field.
How to scale Machine Learning Operations with Feature Stores ?
A Feature Store is a central component of large-scale Machine Learning for mature organizations, providing increased operational efficiency, consistency, and scalability.
More and more organizations are reaching a higher level of maturity regarding ML in production. We have discussed this topic with many organizations and observed a trend: many are wondering how a Feature Store could help them overcome critical challenges.
This presentation aims to improve understanding of Feature Stores by offering an overview of their anatomy, main benefits and pitfalls, internal workings, and different possible architectures. In addition to theoretical content, practical examples of real-world applications will be given throughout the presentation.
Pause
Recharge your batteries, grab a coffee, exchange some ideas and, if curiosity strikes you, let yourself be tempted by a visit to the conference stands.
Pause
Recharge your batteries, grab a coffee, exchange some ideas and, if curiosity strikes you, let yourself be tempted by a visit to the conference booths.
One Thousands and One dbt Models: How BlaBlaCar Moved to dbt in 12 months
In the span of 12 months, we migrated 1,000 dbt models. We introduced a new paradigm and a new tool into our stack. This required training, new frameworks, testing, cross-functional collaboration, setting up squads, etc. We want to share our migration journey, addressing companies considering such an approach and looking to define their strategy and supporting tools.
19
Pause
Recharge your batteries, grab a coffee, exchange some ideas and, if curiosity strikes you, let yourself be tempted by a visit to the conference stands.
20
Back to the future, forward to the past: how the lessons of yesterday shape the data PM role to move forward
In this presentation, we will unveil the essential role of Data Product Managers (DPM) in bringing your data strategy to life.
We will start by addressing the "data as a product" approach, emphasizing its user-centered orientation. Then, we will explore how this method can bring significant benefits even without fully adopting the datamesh.
We will then dive into the specific responsibilities of DPMs, from managing data products to aligning data objectives with business objectives. You will also discover practical tips for seamlessly integrating DPMs into your organization, as well as training and recruitment strategies.
21
Closing the Loop: Alerting your Stakeholders on Data Quality
Data quality is one of the highest priorities, whether it's ensuring the accuracy of our dashboards or the reliability of AI/GenAI models. It's also a responsibility of the Data team, while data producers are typically business and technical squads.
How can we automatically alert them about quality, with more than 5 different teams in 3 different countries? How do we close this loop?
We propose solutions to show you how to implement them in their context, as well as the possibilities we have unlocked.
Managing careers in data: How to never be afraid of the annual review again!
Does all this sound familiar to you? "I've been working here for six months, and I feel stuck – I'm not learning anything anymore, and I feel like I'm stagnating." Or perhaps: "When will I get my next raise? X just got one, and what about me? Besides, Competitor 1 would pay me double." If this resonates with you, join the workshop "Managing careers in data: How to never stress about annual reviews again!", a collaborative workshop where we will build together a competency framework using a proven methodology in companies of 200 as well as 20 people, specially designed for data teams. This framework will offer you a clear guide to help data professionals evolve throughout their career, ensure fair promotions and streamline salary decisions – without turning you into a robot applying a rigid and unsuitable grid.
Help! My GPU calculations seem non-deterministic
The widespread adoption of deep learning-based approaches and more recently the explosion of generative AI usage have established Nvidia GPUs as the accelerators of choice for these applications. However, the programming model offered by the CUDA API involves trade-offs regarding the reproducibility of certain calculations. This talk will immerse you in the world of Nvidia GPUs by covering what they are, how they are programmed, and why certain calculations don't always yield the same results through a concrete example.
Modelling your Business in a Spreadsheet in Just 30 Minutes
The text you provided is already in English. Here it is with the HTML formatting preserved:
Since 2023, as software businesses aim for profitability and growth at all costs is no longer viable, there is a pressing need to ruthlessly prioritise and identify the most promising revenue-generating opportunities.
Data Analysts play a crucial role in helping business leaders identify and evaluate the best opportunities. They are now expected to develop tools that facilitate informed business and product tradeoff decisions.
Given these macro-economic changes, Data Analysts must adapt their skills to align closely with business value. They must now support business leaders by identifying and sizing the best opportunities, and one key competency they need to develop is linking any initiative to business outcomes. This includes performing "what if" scenarios and sensitivity analyses to enable effective business and product tradeoff decisions.In this presentation, we will walk through the step-by-step process of building a "Growth Model," a powerful tool for understanding business mechanics and determining where to allocate resources for growth. We will demonstrate an example model for a B2B SaaS business, sharing lessons from our experience at Dashlane.
We will emphasise that the process of building the tool is as important as the final output. It involves figuring out how all metrics interconnect to produce sensible results, tracking down baseline rates for each assumption, and applying excellent business judgment. To develop this tool, one must have an intuitive sense of the company's strategy, be an unbiased observer, understand the business at a molecular level, and be capable of obtaining accurate data for each input.For analysts, developing this tool will deepen their understanding of the business at a granular level, positioning them as a top strategic resource within their company.
By the end of the presentation, Data Analysts will have a clear vision of the type of data products they should learn to build to advance their analytics career, transitioning from a tactical role to a strategic advisor with data expertise.
If you have French text that needs translation, please provide it and I'll be happy to translate it to English while maintaining the formatting.
22
How elite sport has embraced data for the Paris Games and beyond
How, in light of obtaining the Paris Olympic and Paralympic Games, the French sports ecosystem has organized itself so that data provides a competitive advantage to French athletes. This conference will discuss the constitutive stages of the Sport Data Hub up to concrete examples of using data to estimate potential, predict performance levels, analyze competition...
23
AI on Data - snake oil or actually useful?
Many recent Text-to-SQL solutions claim to replace Data Engineers, but they often lead to unreliable results with inaccurate queries on complex Datasets. In reality, Data Engineering remains essential: pipelines, transformation, and data modeling must be in place for AI to function effectively. LLMs, trained primarily on language, can help interpret queries, but deterministic systems are needed to convert them to SQL using well-defined data models built by Data Engineers. This talk will explore why Data Engineering remains essential to the success of AI implementations.
Q&A about (data) freelancing
24
Humanizing Data Strategy
People are emotional, irrational and unpredictable – and yet, they constitute the most important aspect of any Data Strategy. Tiankai presents his 5C framework: competence, collaboration, communication, creativity and consciousness, with concrete examples to help you truly place humans at the heart of your efforts, and transform your collaborators and employees into active advocates of your Data strategy.