Artwork

Content provided by Larry Swanson. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Larry Swanson or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ro.player.fm/legal.
Player FM - Aplicație Podcast
Treceți offline cu aplicația Player FM !

Alexandre Bertails: The Netflix Unified Data Architecture – Episode 40

31:37
 
Distribuie
 

Manage episode 517605264 series 3644573
Content provided by Larry Swanson. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Larry Swanson or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ro.player.fm/legal.
Alexandre Bertails At Netflix, Alexandre Bertails and his team have adopted the RDF standard to capture the meaning in their content in a consistent way and generate consistent representations of it for a variety of internal customers. The keys to their system are a Unified Data Architecture (UDA) and a domain modeling language, Upper, that let them quickly and efficiently share complex data projections in the formats that their internal engineering customers need. We talked about: his work at Netflix on the content engineering team, the internal operation that keeps the rest of the business running how their search for "one schema to rule them all" and the need for semantic interoperability led to the creation of the Unified Data Architecture (UDA) the components of Netflix's knowledge graph Upper, their domain modeling language their focus on conceptual RDF, resulting in a system that works more like a virtual knowledge graph his team's decision to "buy RDF" and its standards the challenges of aligning multiple internal teams on ontology-writing standards and how they led to the creation of UDA their two main goals in creating their Upper domain modeling language - to keep it as compact as possible and to support federation the unique nature of Upper and its three essential characteristics - it has to be self-describing, self-referencing, and self-governing their use of SHACL and its role in Upper how his background in computer science and formal logic and his discovery of information science brought him to the RDF world and ultimately to his current role the importance of marketing your work internally and using accessible language to describe it to your stakeholders - for example describing your work as a "domain model" rather than an ontology UDA's ability to permit the automatic distribution of semantically precise data across their business with one click how reading the introduction to the original 1999 RDF specification can help prepare you for the LLM/gen AI era Alexandre's bio Alexandre Bertails is an engineer in Content Engineering at Netflix, where he leads the design of the Upper metamodel and the semantic foundations for UDA (Unified Data Architecture). Connect with Alex online LinkedIn bertails.org Resources mentioned in this interview Model Once, Represent Everywhere: UDA (Unified Data Architecture) at Netflix Resource Description Framework (RDF) Schema Specification (1999) Video Here’s the video version of our conversation: https://youtu.be/DCoEo3rt91M Podcast intro transcript This is the Knowledge Graph Insights podcast, episode number 40. When you're orchestrating data operations for an enormous enterprise like Netflix, you need all of the automation help you can get. Alex Bertails and his content engineering team have adopted the RDF standard to build a domain modeling and data distribution platform that lets them automatically share semantically precise data across their business, in the variety of formats that their internal engineering customers need, often with just one click. Interview transcript Larry: Hi, everyone. Welcome to episode number 40 of the Knowledge Graph Insights podcast. I am really excited today to welcome to the show, Alex Bertails. Alex is a software engineer at Netflix, where he's done some really interesting work. We'll talk more about that later today. But welcome, Alex, tell the folks a little bit more about what you're up to these days. Alex: Hi, everyone. I'm Alex. I'm part of the content engineering side of Netflix. Just to make it more concrete, most people will think about the streaming products, that's not us. We are more on the enterprise side, so essentially the people helping the business being run, so more internal operations. I'm a software engineer. I've been part of the initiative called UDA for a few years now, and we published that blog post a few months ago, and that's what most people want to talk about. Larry: Yeah, it's amazing that the excitement about that post and so many people talking about it. But one thing, I think I inferred it from the article, but I don't recall a real explicit statement of the problem you were trying to solve in that. Can you talk a little bit about the business prerogatives that drove you to create UDA? Alex: Yeah, totally. There was no UDA, there's no clear problem that we had to solve and really people, won't realize that, but we've been thinking about that point for a very long time. Essentially, on the enterprise side, you have to think about lots of teams having to represent the same business concepts, think about movie actor region, but really hundreds of them really, across different systems. It's not necessarily people not agreeing on what a movie is, although it happens, but it's really what is the movie across a GraphQL service, a data mesh source, an Iceberg table, resulting in duplicating efforts and definitions at the end not aligning. A few years ago, we were in search for this one schema kind of concept that would actually rule them all, and that's how we got into domain modeling, and how can we do that kind of domain modeling across all representations? Alex: So there was one part of it. The other part is we needed to enable what's called semantic interoperability. Once we have the ability to talk about concepts and domain models across all of the representations, then the next question is how can we actually move and help our users move in between all of those data representations? There is one thing to remember from the article that's actually in the title, that's that concept of model once, represent everywhere. The core idea with all of that is to say once we've been able to capture a domain model in one place, then we have the ability to project and generate consistent representations. In our case, we are focused on GraphQL, Avro, Java, and SQL. That's what we have today, but we are looking into adding more support for other representations. Larry: Interesting. And I think every enterprise will have its own mix of data structures like that that they're mapping things to. I love the way you use the word, project. I think different people talk about what they do with the end results of such systems. You have two concepts you talk about as you talk about this, the notion of mappings, which we're just talking about with the data stuff, but also that notion of projection. That's sort of like once you've instantiated something out this system, you project it out to the end user. Is that kind of how it works? Alex: Yes, so we do use the term, projection, in the more mathematical sense, and more people would call that denotations. So essentially, once you have a domain model, and you can reason about it, and we have actually, a formal representation of the domain models, maybe we'll talk about that a little bit later. But then you can actually define how it's supposed to look like, the exact same thing with the same data semantics, but as an API, for example, in GraphQL, or as a data product in Iceberg, in the data warehouse, or as a low-compacted Kafka topic in our data mesh infrastructure as Avro. So for us, we have to make sure that it's quote, unquote, "the same thing," regardless of the data representation that the user is actually interested in. Alex: To put everything together, you talked about the mappings, what's really interesting for us is that the mappings are just one of the three main components that we have in our knowledge graph, because at the end of the day, UDA at its core is really a knowledge graph which is made out of the domain models. We've talked about that. Then the mappings, the mappings are themselves objects in that knowledge graph, and they are here actually to connect the world of concepts from the domain models through the worlds of data containers, which in our case could represent things like an Iceberg table, so we would want to know the coordinates on the Iceberg table and we would want to know the schema. But that applies as well to the data mesh source abstraction and the Avro schema that goes with it. Alex: That would apply as well, and that's a tricky part that very few people actually try to solve, but that would apply to the GraphQL APIs. We want to be able to say and know, oh, there is a type resolver for that GraphQL type that exists in that domain graph service and it's located exactly over there. So that's the kind of granularity that we actually capture in the knowledge graph. Larry: Very cool. And this is the Knowledge Graph Insights podcast, which is how we ended up talking about this. But that notion of the models, and then the mappings, and then the data containers that actually have everything, I'm just trying to get my head around the scale of this knowledge graph. You said this is not just, but you tease it out, it doesn't have to do with the streaming services or the customer facing part of the business, it's just about your kind of content and data media assets that you need to manage on the back end. Are you sort of an internal service? Is that how it's conceived or? Alex: That's a good question. So we are not so much into the binary data. That's not at all what UDA is about. Again, it's knowledge graph podcast, for sure, but even more precisely, when we say knowledge graph, we really mean conceptual RDF and we are very, very clear about that. That means for us, quite a few things. The knowledge graph, in our case, needs to be able to capture the data wherever it lives. We do not want necessarily to be RDF all the way through, but at the very core of it, there is a lot of RDF. I'm trying to remember how we talk about it. But yeah, so think about a graph representation of connected data. And again, it has to work across all of the data representations,
  continue reading

10 episoade

Artwork
iconDistribuie
 
Manage episode 517605264 series 3644573
Content provided by Larry Swanson. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Larry Swanson or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ro.player.fm/legal.
Alexandre Bertails At Netflix, Alexandre Bertails and his team have adopted the RDF standard to capture the meaning in their content in a consistent way and generate consistent representations of it for a variety of internal customers. The keys to their system are a Unified Data Architecture (UDA) and a domain modeling language, Upper, that let them quickly and efficiently share complex data projections in the formats that their internal engineering customers need. We talked about: his work at Netflix on the content engineering team, the internal operation that keeps the rest of the business running how their search for "one schema to rule them all" and the need for semantic interoperability led to the creation of the Unified Data Architecture (UDA) the components of Netflix's knowledge graph Upper, their domain modeling language their focus on conceptual RDF, resulting in a system that works more like a virtual knowledge graph his team's decision to "buy RDF" and its standards the challenges of aligning multiple internal teams on ontology-writing standards and how they led to the creation of UDA their two main goals in creating their Upper domain modeling language - to keep it as compact as possible and to support federation the unique nature of Upper and its three essential characteristics - it has to be self-describing, self-referencing, and self-governing their use of SHACL and its role in Upper how his background in computer science and formal logic and his discovery of information science brought him to the RDF world and ultimately to his current role the importance of marketing your work internally and using accessible language to describe it to your stakeholders - for example describing your work as a "domain model" rather than an ontology UDA's ability to permit the automatic distribution of semantically precise data across their business with one click how reading the introduction to the original 1999 RDF specification can help prepare you for the LLM/gen AI era Alexandre's bio Alexandre Bertails is an engineer in Content Engineering at Netflix, where he leads the design of the Upper metamodel and the semantic foundations for UDA (Unified Data Architecture). Connect with Alex online LinkedIn bertails.org Resources mentioned in this interview Model Once, Represent Everywhere: UDA (Unified Data Architecture) at Netflix Resource Description Framework (RDF) Schema Specification (1999) Video Here’s the video version of our conversation: https://youtu.be/DCoEo3rt91M Podcast intro transcript This is the Knowledge Graph Insights podcast, episode number 40. When you're orchestrating data operations for an enormous enterprise like Netflix, you need all of the automation help you can get. Alex Bertails and his content engineering team have adopted the RDF standard to build a domain modeling and data distribution platform that lets them automatically share semantically precise data across their business, in the variety of formats that their internal engineering customers need, often with just one click. Interview transcript Larry: Hi, everyone. Welcome to episode number 40 of the Knowledge Graph Insights podcast. I am really excited today to welcome to the show, Alex Bertails. Alex is a software engineer at Netflix, where he's done some really interesting work. We'll talk more about that later today. But welcome, Alex, tell the folks a little bit more about what you're up to these days. Alex: Hi, everyone. I'm Alex. I'm part of the content engineering side of Netflix. Just to make it more concrete, most people will think about the streaming products, that's not us. We are more on the enterprise side, so essentially the people helping the business being run, so more internal operations. I'm a software engineer. I've been part of the initiative called UDA for a few years now, and we published that blog post a few months ago, and that's what most people want to talk about. Larry: Yeah, it's amazing that the excitement about that post and so many people talking about it. But one thing, I think I inferred it from the article, but I don't recall a real explicit statement of the problem you were trying to solve in that. Can you talk a little bit about the business prerogatives that drove you to create UDA? Alex: Yeah, totally. There was no UDA, there's no clear problem that we had to solve and really people, won't realize that, but we've been thinking about that point for a very long time. Essentially, on the enterprise side, you have to think about lots of teams having to represent the same business concepts, think about movie actor region, but really hundreds of them really, across different systems. It's not necessarily people not agreeing on what a movie is, although it happens, but it's really what is the movie across a GraphQL service, a data mesh source, an Iceberg table, resulting in duplicating efforts and definitions at the end not aligning. A few years ago, we were in search for this one schema kind of concept that would actually rule them all, and that's how we got into domain modeling, and how can we do that kind of domain modeling across all representations? Alex: So there was one part of it. The other part is we needed to enable what's called semantic interoperability. Once we have the ability to talk about concepts and domain models across all of the representations, then the next question is how can we actually move and help our users move in between all of those data representations? There is one thing to remember from the article that's actually in the title, that's that concept of model once, represent everywhere. The core idea with all of that is to say once we've been able to capture a domain model in one place, then we have the ability to project and generate consistent representations. In our case, we are focused on GraphQL, Avro, Java, and SQL. That's what we have today, but we are looking into adding more support for other representations. Larry: Interesting. And I think every enterprise will have its own mix of data structures like that that they're mapping things to. I love the way you use the word, project. I think different people talk about what they do with the end results of such systems. You have two concepts you talk about as you talk about this, the notion of mappings, which we're just talking about with the data stuff, but also that notion of projection. That's sort of like once you've instantiated something out this system, you project it out to the end user. Is that kind of how it works? Alex: Yes, so we do use the term, projection, in the more mathematical sense, and more people would call that denotations. So essentially, once you have a domain model, and you can reason about it, and we have actually, a formal representation of the domain models, maybe we'll talk about that a little bit later. But then you can actually define how it's supposed to look like, the exact same thing with the same data semantics, but as an API, for example, in GraphQL, or as a data product in Iceberg, in the data warehouse, or as a low-compacted Kafka topic in our data mesh infrastructure as Avro. So for us, we have to make sure that it's quote, unquote, "the same thing," regardless of the data representation that the user is actually interested in. Alex: To put everything together, you talked about the mappings, what's really interesting for us is that the mappings are just one of the three main components that we have in our knowledge graph, because at the end of the day, UDA at its core is really a knowledge graph which is made out of the domain models. We've talked about that. Then the mappings, the mappings are themselves objects in that knowledge graph, and they are here actually to connect the world of concepts from the domain models through the worlds of data containers, which in our case could represent things like an Iceberg table, so we would want to know the coordinates on the Iceberg table and we would want to know the schema. But that applies as well to the data mesh source abstraction and the Avro schema that goes with it. Alex: That would apply as well, and that's a tricky part that very few people actually try to solve, but that would apply to the GraphQL APIs. We want to be able to say and know, oh, there is a type resolver for that GraphQL type that exists in that domain graph service and it's located exactly over there. So that's the kind of granularity that we actually capture in the knowledge graph. Larry: Very cool. And this is the Knowledge Graph Insights podcast, which is how we ended up talking about this. But that notion of the models, and then the mappings, and then the data containers that actually have everything, I'm just trying to get my head around the scale of this knowledge graph. You said this is not just, but you tease it out, it doesn't have to do with the streaming services or the customer facing part of the business, it's just about your kind of content and data media assets that you need to manage on the back end. Are you sort of an internal service? Is that how it's conceived or? Alex: That's a good question. So we are not so much into the binary data. That's not at all what UDA is about. Again, it's knowledge graph podcast, for sure, but even more precisely, when we say knowledge graph, we really mean conceptual RDF and we are very, very clear about that. That means for us, quite a few things. The knowledge graph, in our case, needs to be able to capture the data wherever it lives. We do not want necessarily to be RDF all the way through, but at the very core of it, there is a lot of RDF. I'm trying to remember how we talk about it. But yeah, so think about a graph representation of connected data. And again, it has to work across all of the data representations,
  continue reading

10 episoade

Toate episoadele

×
 
Loading …

Bun venit la Player FM!

Player FM scanează web-ul pentru podcast-uri de înaltă calitate pentru a vă putea bucura acum. Este cea mai bună aplicație pentru podcast și funcționează pe Android, iPhone și pe web. Înscrieți-vă pentru a sincroniza abonamentele pe toate dispozitivele.

 

Ghid rapid de referință

Listen to this show while you explore
Play