What is Azure SQL Data Warehouse?

This post originally appeared on the Pragmatic Works Blog Site, but wanted to also provide it here.

Introduction

This post provides a review of what an Azure SQL Data Warehouse is, as well as its benefits to performance and efficiency.

What is Azure SQL Data Warehouse?

Azure SQL Data Warehouse is a service that is Microsoft’s offering of a public data center in Azure.  It’s a Platform as a Service (PaaS) offering, so it allows development, running and managing data applications without the complexities of building and maintaining the infrastructure typically associated with developing and launching a traditional data warehouse. The primary focus is loading and querying the database.

Architecture

Azure Data Warehouse leverages Massively Parallel Processing (MPP) to quickly run complex queries across petabytes of data. An MPP system executes queries across a distributed system, where different computers, called nodes, cooperate to answer queries. Azure SQL Data Warehouse relies upon distributed storage into multiple machines and these multiple machines compute answers to queries.

Azure Data Warehouse Architecture

A traditional data warehousing scenario is typically comprised of one large machine (physical or virtual) and typically on premise utilizes symmetric multi-processing (SMP). An example would be a classic SQL Server setup. In contrast, an MPP system is comprised of multiple machines and the MPP machines will all have a slice of the data from the database.

A query comes in, the query is distributed to each node and the answers to queries are returned quickly and efficiently. As a result, this scales very well as the amount of data increases.

DWUs

Data Warehouse Units

With Azure SQL Data Warehouse, it isn’t necessary or required to select a specific configuration of CPUs, RAM or storage. Compute power of the database is measured in Data Warehousing Units (DWU). When setting up Azure Data Warehouse the amount of DWUs that are provisioned are set.  This can be adjusted as necessary.

There is the notion of Gen 1 and Gen 2 data warehouses. Gen 1 data warehouses are measured in DWUs (Data Warehouse Units) and Gen 2 data warehouses are measured in cDWUs (Compute Data Warehouse Units).  See Data Warehouse Units (DWUs) and compute Data Warehouse Units (cDWUs).

The difference of these performance tiers is reflected on the invoice as unit of scale, which directly translates to billing. Both support scaling compute up or down and pausing the compute when you don’t need to use the data warehouse.

For example, let’s say a data warehouse is provisioned with 100 DWUs. A test is ran and the test results reveal that 3 tables were loaded in 15 mins and rendered a report in 20 minutes. Then an adjustment is made to increase the compute power to 500 DWUs and the test is executed again. This time the test results reveal 3 tables were loaded in 3 minutes and rendered a report in 4 minutes, thus a 5X improvement.

By changing the service level, and altering the number of DWUs that are allocated to the system this in turn adjusts the performance and cost of the system.

This brief video provides an overview of Azure Data Warehouse.

 

Be the first to comment

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.