Tech Insights
Akalanka Silva
June 30, 2020

Query Excel file data using SQL Server PolyBase

Query Excel file data using SQL Server PolyBase
What is PolyBase?

PloyBase is the technologyused to connect external data sources to the SQL server. It can be used to query tables or files in an external datasource. Initially, this was introducedwith SQL Server 2016. At that time, it was limited to a few external datasources such as Big data clusters and Azure Blog Storage data. However, in SQLServer 2019 version it was expanded to add more data sources.

The following data sources canconnect using SQL Server 2019 instance.

·        Big data clusters

·        Azure Blob storage

·        Relational Database (ex: SQLServer, Oracle, Teradata)

·        NoSQL (ex: MongoDB)

·        ODBC (ex: Excel)

                                                                      Polybase basichierarchy

How to configure PolyBase in SQL Server?

You can install PolyBase witha new SQL Server instance or you can add PolyBase feature to an existing SQLServer instance. In both scenarios, you need to select PolyBase Query Servicefor External Data and Java connector for the HDFS data source (Not mandatory).

                                                                    Add Polybase to SQL Server

After installation complete,you need to go to the Services and check the following new services youstarted. If not running, you need to start these two services manually.

                                                                             PolyBase services

To confirm that the PolyBaseinstallation is successful, you can run the following command in SQL ServerManagement Studio new query window.

SELECT SERVERPROPERTY ('IsPolyBaseInstalled') AS IsSuccessfullyInstalled;

If the installation issuccessful, it will display ‘1’ in the results window.

Query excel file usingPolyBase

SQL Server 2019 introduced afacility to connect Excel file to SQL Server Using PolyBase via ODBC driver.Before starting, you need to make sure that the correct ODBC driver isinstalled on your computer. To check that you need to open the ODBC Data SourceAdministrator application. In the ODBC Data source driver list, the followingdriver should be installed. If not, you can download it from the here.

                                                            ODBC driver installation

To read excel file data usingSQL Server PolyBase, you need to follow the below steps.

·        First, create a new SQL ServerDatabase.

·        Then, generate the Master Keyfor above created database using the following command.

CREATEMASTER KEY ENCRYPTION BY PASSWORD = '<Enter new Password>;

GO

·        Next step isto create a new excel file or download a sample excel file from the web (Forthis example, download the excel file from here)

·        After downloading the excelfile, create a new External data source for it using the following command. Youshould give the correct excel file path for CONNECTION_OPTIONS. For thepassword, you can use the previously created password for the master key.

OPEN MASTER KEY DECRYPTION BY PASSWORD = '<Enter created master key Password >';
GO
CREATE EXTERNAL DATA SOURCE FinancialSample WITH
(
   LOCATION = 'odbc://noplace',

 CONNECTION_OPTIONS = 'Driver={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm,*.xlsb)}; DBQ=D:\PolyBase\FinancialSample.xlsx'
);

If this command issuccessfully executed, it should create a new Data Source in External DataSource section in the created database.

   Created external data source

Next, you need to generate anexternal table for the excel file. For that, you can use the following command.Excel sheet column names and Table column names should match each other, datatypes should match each other as well. For the Location, you have to use thename of the excel sheet.

CREATEEXTERNAL TABLE dbo.FinancialSample

  (

       Segment NVARCHAR(255),
       Country NVARCHAR(255),
       Product NVARCHAR(255),
       DiscountBand NVARCHAR(255),
       UnitsSold FLOAT(53),
       ManufacturingPrice FLOAT(53),
       SalePrice FLOAT(53),
       GrossSales FLOAT(53),    
       COGS FLOAT(53),
       Profit FLOAT(53),
       MonthNumber FLOAT(53),
       MonthName NVARCHAR(255),
       Year NVARCHAR(255)
   )
   WITH
   (
       LOCATION = '[Sheet1$]',
       DATA_SOURCE = FinancialSample
  );

·        When this is successfully executed, it should display the new External Table in your created database.

            Createdexternal table

·        Now you can Query Excel filedata using the SQL Server. To do that you can run the following select commandfor the created external table. If this is executed successfully, the resultsection should display excel file data.

SELECT  [Segment]

     ,[Country]
     ,[Product]
    ,[DiscountBand]
     ,[UnitsSold]
    ,[ManufacturingPrice]
     ,[SalePrice]
    ,[GrossSales]
     ,[COGS]
     ,[Profit]
    ,[MonthNumber]
     ,[MonthName]
     ,[Year]

 FROM[PolyBaseDemo].[dbo].[FinancialSample]

                                                                                        Query Results window

·        If you change data in yourexcel file and rerun the above select query, those changes will appear in theresults window. There’s no need to run any other Import query.

Importance

You are reading these datafrom the originally located excel file. You’re not importing data to SQL ServerDatabase using any other external method. Therefore, you can save the dataimporting time. Also, you can query the real time data in the file. So, this isvery important when working with big data.