ADF does not directly support copying a folder/multiple files from SharePoint Online, but there are workarounds to achieve this. Two additional steps needed here as compared to single file copy are:
- Get the list of files:
- User can maintain the file names in a text file manually, OR
- Use Web Activity to call SharePoint Rest API to get the list of files.
- ForEach Activity to loop the list of relative file names and pass the file name to Copy Activity (Base URL changes a bit as compared to single file copy)
Below is how the pipeline flow would look like:
Web1 – Get the access token from SPO
Web2 – Get the list of files from SPO folder
ForEach1 – Loop the list of file names
Copy1 – Copy data with HTTP connector as source
Step1:
Grab Access token from SPO
Copy file from SharePoint Online leverages AAD/service principal authentication and SharePoint API to retrieve files.
- Register SharePoint Application and Grant permission - https://docs.microsoft.com/en-us/azure/storage/common/storage-auth-aad-app?tabs=dotnet#register-your-application-with-an-azure-ad-tenant
a) Register AAD Application
- On Azure Portal, go to AAD app registration page: https://portal.azure.com/#blade/Microsoft_AAD_IAM/ActiveDirectoryMenuBlade/RegisteredApps
- New Registration à Enter your App name
- Go to "Certificates & secrets", create new client secret, you can set the expire to 1Y/2Y/Never
b) Grant SharePoint site permission to your registered App (need site owner permission on SharePoint)
Full details on how to register app and also granting permissions is mentioned in prerequisites here - https://docs.microsoft.com/en-us/azure/data-factory/connector-sharepoint-online-list#prerequisites
c) Create an ADF Pipeline. Start with creating a Web Activity to get the access token
- URL: https://accounts.accesscontrol.windows.net/[Tenant-ID]/tokens/OAuth/2
- Method: POST
Headers:
- Content-Type: application/x-www-form-urlencoded
- Body: grant_type=client_credentials&client_id=[Client-ID]@[Tenant-ID]&client_secret=[Client-Secret]&resource=00000003-0000-0ff1-ce00-000000000000/[Tenant-Name].sharepoint.com@[Tenant-ID]
Debug run to check if the activity succeeds and also check the activity output to see if it returns the access token in the payload. You can also verify the same using Postman client to check if the token is valid.
Step 2:
Get the list of Files
- Create another Web Activity to get the list of files
- URL: https://{site_url}/_api/web/GetFolderByServerRelativeUrl('/Folder Name')/Files
- Method: GET
Headers:
- Authorization: @{concat('Bearer ', activity('WebActivity1Name').output.access_token)}
- Accept: application/json
Debug run to see if the activity succeeds, and check it shows the list of files under the folder in the output.
Step 3:
Loop the list of relative file names
- Create a ForEach Activity with inner Copy activity
- Items: @activity('WebActivity2Name').output.value
Step 4:
Create Copy activity
- New dataset -> HTTP -> Binary type:
a) HTTP linked service
- Base URL: https://<SiteUrl>/_api/web/GetFileByServerRelativeUrl ('@{linkedService().FileName}')/$value
- Authentication Type: Anonymous (use token configured on copy activity source)
b) Configure copy activity HTTP source
Dataset properties:
- Name: RelativeURL (Any name)
- Value: @{item().ServerRelativeUrl}
- Request method: GET
- Additional header: “Authorization: Bearer <accessToken>” (accessToken is generated in Step1)
Tip: You can test with a static access token gotten from the previous Web activity output first. You can also use expression (add dynamic content): @{concat('Authorization: Bearer ',activity('WebActivityName').output.access_token)}
c) Configure Linked Service properties
- Name: FileName (Any Name)
- Value: @dataset().RelativeURL
2. Create Copy sink as below
Successful pipeline run as follows:
Thanks to @Jijo Puthooran for helping me in authoring this blog.
Posted at https://sl.advdat.com/3h5eLjU