Data Minister

Great options for hosting a SQL database in Azure.

If you need to create a SQL Server database within your organisation, using your own hardware, there is generally one well-established way of doing that. You need to provision a server (either physical or virtual), install a SQL Server engine and instance on that server, and then create your database within that instance. There are a huge variety of ways that you can tailor your installation of SQL Server to meet specific needs, but the general principal of server / instance / database remains the same. Your company will need to look after all of these themselves, along with any related tasks you create to support the database.

However, if you are open to hosting your database within a cloud-based environment, there are several different set-ups that you can choose from. All of these are quite different, varying in both their capabilities, and the effort that they will take you to create and maintain.

I couldn’t find a graphic that gave a concise summary of the main features of each option for hosting a SQL database in Azure, so I made the one below. I’ve tried to align each option to their nearest equivalent in an on-premise SQL Server environment.

Click on the “full screen” icon at the top of the picture to see a larger copy, or download a PDF version using the button below the diagram. The highlighted words link to the relevant page on the Microsoft website, which provides more detailed information.

SQL Server On Premise vs Azure Download PDF

A very brief explanation of SQL databases in Azure

Unlike an on-premise SQL Server environment, in Azure you don’t need to create a server and SQL instance first, before you can create a database. With Azure SQL Database you can simply enter a few basic details of the database that you’d like to create into a form, and Azure will go away and create it using it’s own resources. Azure will also automatically take care of backing up, performance tuning, licensing, and replicating the database for high-availability. This is the lowest maintenance solution, but it does have several limitations.

For example, you can’t run cross-database queries, or really interact with other databases outside of the Azure SQL Database you’re using. Neither can you send mail from the database, or take your own adhoc manual backups. There is also little isolation from other Azure SQL Databases – you’re sharing resources with other customers. The second option – Azure SQL Managed Instance – removes many of these limitations. It allows you to group together several databases within an instance with it’s own dedicated compute and storage resources, and a private IP address. This is closer to the concept of the SQL instance in an on-premise environment.

The third option – an Azure Virtual Machine with SQL Server installed on it – most closely resembles the traditional on-premise server / instance / database arrangement. The main difference is that rather than the Infrastructure team at your organisation providing you with a server that they’ve created with company resources, Azure provides you with one using their storage and compute power instead. After you have the server, the rest of the process is very similar whether you’re using on-premise hardware or an Azure Virtual Machine. Either way, you’ll need to arrange for SQL Server to be installed, and create an instance before you can start creating databases. You’ll need to take care of SQL security patches, back-ups, optimisation, etc yourself too – although there is some automated help that you can get from Azure. This solution involves the most maintenance, but also offers the most flexibility and compatibility with SQL Server.

Of course, there is a lot more to each variety of database in Azure than what I’ve just mentioned, and this is only meant to be a very general introduction. There is a massive amount of extra documentation on the Microsoft website.

If you’d like any additional advice or help on creating databases within Azure, feel free to contact us at hello@dataminister.com

Azure Data Factory using self-hosted integration runtime

In my first attempt to create an Azure Data Factory job, I used compute resources in Azure that are managed by Data Factory itself. Alternatively, you can use a self-hosted integration runtime to make use of compute resources elsewhere – such as your own on-premise environment.

In this example, I install a self-hosted integration runtime within my own on-premise environment (which is just my laptop!), and create a Data Factory job to extract content from a table hosted on a SQL Server instance running on that same laptop.

Creating the self-hosted integration runtime

To do this, you must open Azure Data Studio, select the “Manage” toolbox icon from the left-hand menu, “Integration runtimes” from the “Connections” list, and then “+ New”.

The “AutoResolveIntegrationRuntime” that you can see present already is the Data Factory-managed Azure resource that I used for the previous copy data pipeline.

Choose “Azure, Self-Hosted”

Of the options that then appear, I am going to select “Self-Hosted” to try and install the integration runtime on my laptop.

Copy and paste one of the Authentication Keys that appear (store them somewhere that you can access later). Then click “Download and install integration runtime”

You’ll be taken to a page of the Microsoft Download Centre, where you can choose from a few recent versions of the integration runtime installer.

Once the .msi file has downloaded to your local machine, open it and run through the installation wizard.

Paste the authentication key that you saved earlier into the registration page shown below:

The “Integration Runtime (Self-hosted) node name” that appears below should be the name of the local machine that you installed the integration runtime on (my company laptop is named “DATAMINISTER”)

Click “Finish” to complete the registration.

You now have a new integration runtime (IR) to choose within Data Factory Studio:

Test out the self-hosted IR with a new pipeline

Check that the new IR can be used successfully in a Data Factory pipeline by opening up Azure Data Factory Studio, and selecting “New pipeline” from “Factory Resources”.

Create a “Copy data” activity…

Within that copy data activity, create a new source dataset connection, with a type of “SQL Server”. This will be configured to connect to a SQL Server instance on the same server that the self-hosted IR was created on.

Fill in the details of your on-premise SQL Server instance, and the database within that instance that you want to extract data from.

In the “Connect via integration runtime” dropdown, you can select the self-hosted IR that you’ve just created.

You’ll need to provide an authentication method too…

In the above, I’m using a SQL account to connect to the source database. The success of the connection will rely upon the SQL account having adequate access to the database.

Checking the account’s credentials within SQL Server Management Studio (SSMS) shows that the “DataFactory_Reader” user has read access within the “AdventureWorksDW2017” database.

Back in Azure Data Factory Studio, if the Linked Service you’ve just set up is working OK you should be able to see individual tables under the “Table name” dropdown, within the “Connection” section of the Linked Service.

Back on the “Source” tab of the “Copy data” task for the pipeline, you can enter your own SQL query to just extract a few named columns from the table you’ve specified in the connection above.

You can even “Preview data” to check this is working:

With the connection to the source database created, you now need a destination database to transfer data to. I am going to use a new table within the same “DataMinister_DB” that I created in this previous blog.

To create the new table, I opened a Query Editor session against the database within the Azure portal, and created a simple table (that has a schema matching the data I’ve selected with my query, which extracts three columns from the source database).

Within the “Sink” tab, I specify a connection to this new table:

Now that I’ve configured the source and destination databases/tables for the “Copy data” activity, I can run “Debug” within Azure Data Factory Studio to check it works OK…

…and it does! Note that the integration runtime used was the self-hosted IR that I created at the start of this article.

120 rows of data have been copied from the on-premise database hosted upon my laptop, to the Azure SQL Database that I created within the cloud.

It’s worth noting that this process succeeded first time for me because I’d already added several IP addresses that Data Factory uses to the “approved” IPs list for the destination database (while I was running through the steps in this post). If you haven’t followed the steps within that post first, you may get a few connection errors when trying to connect to the Azure database. The post shows how you can overcome these errors by “approving” the IP addresses.

A basic Azure Data Factory pipeline: Copying data from a csv to an Azure SQL database

This is my first attempt at creating an Azure Data Factory pipeline. It’s not a difficult process, but there are several steps to it. So that I don’t forget them all for next time, and in case this is helpful for anyone else, here are my notes.

Setting up the initial environment

If you don’t have one already, you’ll need a source and destination to copy the data between. In my case, these will be a csv file held within Azure Blob Storage, and an Azure SQL Database. Both of these (along with the Data Factory pipeline itself) will need to live within a Resource Group, so I’ll create that first.

Click on “Resource groups” within the Azure portal, and then “+ Create”.

Give the Resource Group a name and region (preferably the same one as where your source and destinations will be).

Create a Storage Account, to hold the source csv file.

Within the new Storage Account, create a Blob Service, and container to hold your files.

Upload the csv file, which will act as the source data in the Data Factory pipeline. For the sake of this demo, I’m using a very simple one-column dataset of salaries. The file is called “LFData”

With the file that you want to upload selected, press the “Upload” button.

The freshly-uploaded file should appear in your container.

Creating an Azure SQL Database

In the Azure portal, search for “Azure SQL”, and then select the “Azure SQL” option that appears in the Marketplace.

In the “SQL databases” window, select “Single database” from the dropdown.

Specify your Subscription, to be used for the database deployment.
Select the Resource Group created earlier.
Give the database a name (we use “DataMinister_DB”).
Choose a server to put the database on. If you haven’t got one already,
you’ll need to create a new one, as shown further below…

If you don’t have a server set up already, click “Create new” under the “Server” dropdown, and complete the form below. You’ll need to specify a SQL or AD account to use for authentication.

Back in the “Create SQL Database” wizard, go to the “networking” tab and allow connectivity through a public endpoint. Under “Firewall rules”, add the current client IP address to the approved list.

Click “Review + Create”, and then you’ll be taken to a “Deployment in progress” screen, which hopefully results in a successfully-created new database.

Within that database, select “Query editor” from the menu on the left:

Write a simple query to create a basic table (which will need to match the schema of the data you’re importing), and click “Run”. The table should now be visible in the object explorer on the left:

Create the Azure Data Factory Pipeline

Dive into the new Resource Group and click “create a resource”. Then from the integration menu, choose “Data Factory”.

Create a Data Factory instance inside of the Resource Group.

Once the new resource has been created, click on it, and select “Open Azure Data Factory Studio” from the “Get started” section.

Create a new pipeline.

From the “Move & transform” menu, drag “copy data” over to the pipeline canvas.

Go to “Source” for the “Copy data” activity, and “+ New”.

On the “New dataset” menu that appears, select “Azure Blob Storage”

Hit “continue”, and then “DelimitedText” for the file format:

Give the dataset a name (“LFData_Import”), and create a new linked service.

Point the linked service to the Storage Account that you created previously.

Test the connection.

Once the linked service has been created, select “Open” next to the source dataset.

Specify the exact file within the store that you want to import data from.

You have the option to “preview data”, to check things look OK.

Following a very similar method to create a destination, select the “Sink” tab of the “Copy data” activity, then “+ New”, and choose “Azure SQL Database” from the options that appear.

Specify the connection details to the Azure SQL Database that you created earlier. You’ll need to provide authentication details too. I’m using the same “admin” account for simplicity here, although in a real environment it may be preferable to use a separate account.

Clicking the “Test connection” button should result in an error message at first.

The IP address given in the error message will need to be added to the approved list of the server that contains the database you’re connecting to.

If the IP address is dynamic, you may want to use the whole IP range…

Try the connection again, and hopefully it succeeds now.

Select the table within the database.

Back in the Data Factory pipeline, hit “Debug”. You’ll probably get another error at first, and have to add another IP address to the approved list for the database server.

After pressing “Debug” again, now you should hopefully get a successful run of the copy activity.

Back in the Query Editor for the database, a SELECT statement should show that data from the csv held on Azure Blob Storage has now been imported into the Azure SQL Database table, using Data Factory.