This is my first attempt at creating an Azure Data Factory pipeline. It’s not a difficult process, but there are several steps to it. So that I don’t forget them all for next time, and in case this is helpful for anyone else, here are my notes.
Setting up the initial environment
If you don’t have one already, you’ll need a source and destination to copy the data between. In my case, these will be a csv file held within Azure Blob Storage, and an Azure SQL Database. Both of these (along with the Data Factory pipeline itself) will need to live within a Resource Group, so I’ll create that first.
Click on “Resource groups” within the Azure portal, and then “+ Create”.
Give the Resource Group a name and region (preferably the same one as where your source and destinations will be).
Create a Storage Account, to hold the source csv file.
Within the new Storage Account, create a Blob Service, and container to hold your files.
Upload the csv file, which will act as the source data in the Data Factory pipeline. For the sake of this demo, I’m using a very simple one-column dataset of salaries. The file is called “LFData”
With the file that you want to upload selected, press the “Upload” button.
The freshly-uploaded file should appear in your container.
Creating an Azure SQL Database
In the Azure portal, search for “Azure SQL”, and then select the “Azure SQL” option that appears in the Marketplace.
In the “SQL databases” window, select “Single database” from the dropdown.
Specify your Subscription, to be used for the database deployment.
Select the Resource Group created earlier.
Give the database a name (we use “DataMinister_DB”).
Choose a server to put the database on. If you haven’t got one already,
you’ll need to create a new one, as shown further below…
If you don’t have a server set up already, click “Create new” under the “Server” dropdown, and complete the form below. You’ll need to specify a SQL or AD account to use for authentication.
Back in the “Create SQL Database” wizard, go to the “networking” tab and allow connectivity through a public endpoint. Under “Firewall rules”, add the current client IP address to the approved list.
Click “Review + Create”, and then you’ll be taken to a “Deployment in progress” screen, which hopefully results in a successfully-created new database.
Within that database, select “Query editor” from the menu on the left:
Login using the SQL account / password you’ve just created:
Write a simple query to create a basic table (which will need to match the schema of the data you’re importing), and click “Run”. The table should now be visible in the object explorer on the left:
Create the Azure Data Factory Pipeline
Dive into the new Resource Group and click “create a resource”. Then from the integration menu, choose “Data Factory”.
Create a Data Factory instance inside of the Resource Group.
Once the new resource has been created, click on it, and select “Open Azure Data Factory Studio” from the “Get started” section.
Create a new pipeline.
From the “Move & transform” menu, drag “copy data” over to the pipeline canvas.
Go to “Source” for the “Copy data” activity, and “+ New”.
On the “New dataset” menu that appears, select “Azure Blob Storage”
Hit “continue”, and then “DelimitedText” for the file format:
Give the dataset a name (“LFData_Import”), and create a new linked service.
Point the linked service to the Storage Account that you created previously.
Test the connection.
Once the linked service has been created, select “Open” next to the source dataset.
Specify the exact file within the store that you want to import data from.
You have the option to “preview data”, to check things look OK.
Following a very similar method to create a destination, select the “Sink” tab of the “Copy data” activity, then “+ New”, and choose “Azure SQL Database” from the options that appear.
Specify the connection details to the Azure SQL Database that you created earlier. You’ll need to provide authentication details too. I’m using the same “admin” account for simplicity here, although in a real environment it may be preferable to use a separate account.
Clicking the “Test connection” button should result in an error message at first.
The IP address given in the error message will need to be added to the approved list of the server that contains the database you’re connecting to.
If the IP address is dynamic, you may want to use the whole IP range…
Try the connection again, and hopefully it succeeds now.
Select the table within the database.
Back in the Data Factory pipeline, hit “Debug”. You’ll probably get another error at first, and have to add another IP address to the approved list for the database server.
After pressing “Debug” again, now you should hopefully get a successful run of the copy activity.
Back in the Query Editor for the database, a SELECT statement should show that data from the csv held on Azure Blob Storage has now been imported into the Azure SQL Database table, using Data Factory.