Azure Data Platform End2End
Fabio Braga
Cloud Solution Architect – Data Platform at Microsoft
May I ask you a favour? Please, put your hand up if you ever had a chance to work on a Hadoop or Spark project before in your organisation. Or if you ever worked on a NoSQL project before. Or if you had to scratch your head trying to figure out how to handle temporal analysis of data streams. Or if you used Machine Learning algorithms in your data pipeline? I am inclined to say that not a lot of you raised your hand four times.
But now let me ask you another one. Please, put your hand up if you have been working with business intelligence for many years using relational data and data visualisation tools. Ah, now I see few more hands up.
The point I am trying to make here is that many of us – data people – come from a traditional, on-premise, relational data background. And depending on the size and complexity of the organisations we have worked for; it means that some of us never had a chance to work on Big Data projects. We hear a lot about how cloud, data lakes, Spark, machine learning and AI are revolutionising the world of analytics while we keep working on our SQL queries to generate a management dashboard.
Organisations looking to stay relevant in the market recognise the value of data and are now asking their Data Architects to modernise their data platform. They have the mammoth task to handle an ever-growing amount of data that comes in different shapes and very fast. In order to fulfil the request, they are looking to leverage the power and flexibility of Azure cloud data services, but the problem is…there are so many of them! “What do they do?”, “How they talk to each other?”, “Which one should I choose?”, “What does an architecture look like?” are the most common questions I hear.
To help you all understand the Azure Data Platform and how to apply these data services to your data projects we’ve designed and implemented the Azure Data Platform End2End workshop. It puts together many of the Azure data services in a common architecture that will be able to handle most data scenarios in your organisation:
- relational data pipelines;
- big data file ingestion and processing with Spark clusters;
- non-structured data ingestion and the use of AI to generate insights;
- real-time ingestion and visualisation;
In a series of 5 labs, you will progressively implement this architecture and have a chance to understand the concepts behind each service and its place in the overall architecture. You will implement a data lake, use a Spark cluster to process big data files, incorporate AI to generate metadata for non-structured data and design a dashboard to visualise real-time data streams.
The workshop content is publicly available from GitHub (https://aka.ms/ADPE2E). You can follow the instructions on the Deployment section to provision all Azure Data Services in your subscription and once complete you can start the labs.
By the end of workshop you will have in front of you a complete data platform architecture you can apply to your organisation to start ingesting and processing all types of data, in large volumes and at any speed and backed the power and flexibility of Azure.
Have fun!