In Azure, create a storage account. This example is called mjeblobstore.
In the storage account, create a BLOB container. We created one called address.
Upload your data files to the container. We uploaded 2 CSV files, one with 1000001 records and one with 4 records.
Obtain an access key for the storage account. Either key is acceptable.
In SQL Server Management Studio, check that Polybase is installed.
SELECT SERVERPROPERTY ('IsPolyBaseInstalled') AS IsPolyBaseInstalled;
Polybase installation instructions are here - https://docs.microsoft.com/en-us/sql/relational-databases/polybase/polybase-installation?view=sql-server-ver15
Configure Polybase Connectivity (a restart is required).
sp_configure @configname = 'hadoop connectivity', @configvalue = 7;
GO
RECONFIGURE
GO
Create a master key
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'replace_with_a_secure_value';
GO
Create a scoped credential, the value for Secret is one of the Azure storage account access keys
CREATE DATABASE SCOPED CREDENTIAL AzureStorageCredential
WITH IDENTITY = 'any_value', Secret = 'uXjQwmkX13ARb==';
Create an external data source, the value for LOCATION uses the Azure storage account and container name (mjeblobstore and address)
CREATE EXTERNAL DATA SOURCE AzureStorage with (
TYPE = HADOOP,
LOCATION ='wasbs://address@mjeblobstore.blob.core.windows.net',
CREDENTIAL = AzureStorageCredential
);
Create an external file format that matches the files in your Azure storage container.
CREATE EXTERNAL FILE FORMAT stringdelim_CSV
WITH (FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS(
FIELD_TERMINATOR = ',',
STRING_DELIMITER = '"',
USE_TYPE_DEFAULT = True)
);
Create an external table. Our example files are in the root of the BLOB container, so our LOCATION is ‘/’.
CREATE EXTERNAL TABLE [dbo].[US_Address] (
first_name varchar(100),
last_name varchar(100),
company_name varchar(100),
address varchar(100),
city varchar(100),
county varchar(100),
state varchar(100),
zip varchar(100),
phone1 varchar(100),
phone2 varchar(100),
email varchar(100),
web varchar(100)
)
WITH (LOCATION='/',
DATA_SOURCE = AzureStorage,
FILE_FORMAT = stringdelim_CSV
);
The Azure data can now be queried as a table.
Note
SQL Server using Polybase does not support skipping rows from data files stored in Azure Blob storage. This can cause issues when data files have a header row, for example column names. However, as header rows are normally only a convenience for developers, a recommendation would be to write data without any header rows.
Comments
0 comments