Wednesday, 26 July 2017

Azure Data Lake | Get Start with Azure Data Lake with Simple Definitions

If you are new to the word Azure, then let me tell you Azure is a cloud. It has multiple applications and services. Azure Data Lake is one such a service that is provided through Azure. Azure is provided by Microsoft and we have to create Azure Subscription to start using it. Microsoft provides free trail subscription of 30 days and after that we have to pay as we use the cloud Azure.
Coming to our topic of discussion, Azure Data Lake is a service provided for Big Data where we can securely store data of any size – any kind and do analysis. Azure Data Lake (will be referred as ADL here after) has following important parts,


1. Azure Data Lake Store (ADLS) 
2. Azure Data Lake Analytics (ADLS) 
3. U-SQL

Azure Data Lake Store (ADLS): As the name explains, it is a store for data. ADLS is a service and place where Data of any kind and any size is imported and stored. ADLS can also have RDBMS objects like Tables, Views, Functions or Stored Procedures. It is a massive storage where one can store trillions of files of size more than Petabytes.

Azure Data Lake Analytics (ADLA): It is a service that lets us do analysis of data that are stored in ADLS. In order to do Data Analytics or create ADLA service, it is mandatory to connect ADLA service to ADLS. ADLA uses special kind of structured query language, unlike the one used for RDMS, called U-SQL. U-SQL scripts are written in ADLA service and are executed through Jobs. Every U-SQL script we write is executed through job. We can extract and transform data present in ADLA and produce output for analysis with ADLA service by running U-SQL jobs.

U-SQL: U-SQL is combination of SQL and C#. Microsoft has come up with this scripting language by integrating the features of C# into SQL. This helps user call C# function directly in SQL. Also, developer can write his/her own C# custom code that can be used in U-SQL.
U-SQL is used in ADLA to extract data from ADLS, Transform extracted data and generated the transformed data as output. All U-SQL scripts are run from ADLA through job.