In Azure, create a storage account. This example is called mjeblobstore.
In the storage account, create a BLOB container. We created one called address.
Upload your data files to the container. We uploaded 2 CSV files, one with 1000001 records and one with 4 records.
Obtain an access key for the storage account. Either key is acceptable.
In SQL Server Management Studio, check that Polybase is installed.
SELECT SERVERPROPERTY ('IsPolyBaseInstalled') AS IsPolyBaseInstalled;
Polybase installation instructions are here - https://docs.microsoft.com/en-us/sql/relational-databases/polybase/polybase-installation?view=sql-server-ver15
Configure Polybase Connectivity (a restart is required).
sp_configure @configname = 'hadoop connectivity', @configvalue = 7;
Create a master key
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'replace_with_a_secure_value';
Create a scoped credential, the value for Secret is one of the Azure storage account access keys
CREATE DATABASE SCOPED CREDENTIAL AzureStorageCredential
WITH IDENTITY = 'any_value', Secret = 'uXjQwmkX13ARb==';
Create an external data source, the value for LOCATION uses the Azure storage account and container name (mjeblobstore and address)
CREATE EXTERNAL DATA SOURCE AzureStorage with (
TYPE = HADOOP,
CREDENTIAL = AzureStorageCredential
Create an external file format that matches the files in your Azure storage container.
CREATE EXTERNAL FILE FORMAT stringdelim_CSV
WITH (FORMAT_TYPE = DELIMITEDTEXT,
FIELD_TERMINATOR = ',',
STRING_DELIMITER = '"',
USE_TYPE_DEFAULT = True)
Create an external table. Our example files are in the root of the BLOB container, so our LOCATION is ‘/’.
CREATE EXTERNAL TABLE [dbo].[US_Address] (
DATA_SOURCE = AzureStorage,
FILE_FORMAT = stringdelim_CSV
The Azure data can now be queried as a table.
SQL Server using Polybase does not support skipping rows from data files stored in Azure Blob storage. This can cause issues when data files have a header row, for example column names. However, as header rows are normally only a convenience for developers, a recommendation would be to write data without any header rows.
Please sign in to leave a comment.