The RtoSQLServer
package relies on DBI
and odbc
packages for
its database functionality, however you should consider using it instead
of DBI
or odbc
for the following reasons:
Importing large dataframes
A key benefit of the package is to ensure the data loading process is
more robust than using DBI
and odbc
packages
alone.
When loading large dataframes from R into MS SQL Server using
DBI
it was found that the loading process can hang and
never complete.
People using R to load data into SQL Server may be working at home and have varying quality of connection to the database server.
RtoSQLServer
does two things to help with data
loading:
Loading the dataframe into a staging table in the database first of all. When the load into the staging table is complete, the target table is created as a copy of the staging table. This helps to prevent misleading problems of incomplete loads. The staging table is dropped once the target table has been created successfully.
Loading in batches. The
RtoSQLServer
package’swrite_dataframe_to_db
function includes abatch_size
argument. Batch size is the number of rows of the R dataframe that are loaded into the database staging table at once. Using the batch loading process inRtoSQLServer
it has been possible to load large tables it was not possible to load in one usingDBI
. The batch size is set for the user at a default of 100K so the user does not need to worry about a suitable batch size or splitting up the source dataframe in R.
Self-contained database transactions
Unlike DBI
and odbc
where user must open a
database connection object in R and then pass this connection to
functions (before hopefully remembering to close it at the end of the R
session), in RtoSQLServer
a database connection is
established when calling each function and closed at the end of that
function. This means there are not hung connections to the database,
waiting until the user closes R.
The RtoSQLServer
functions can be used within user
custom functions as self-contained tasks and do not rely on an open
connection input argument. This also means a user does not need to know
how to setup a connection, they simply pass the server and database name
to the function. This can make it easier to run code others have
written.
Repeatedly opening and closing connections might make things slightly slower, however it is thought this will not have much impact on the workflow of the average R user where there will be only a few database transactions in an R session.
Checking of table, column names and datatypes from R to SQL Server
The DBI
and odbc
packages are designed to
work with many different database engines. The RtoSQLServer
package has focused on Microsoft SQL Server. This allows checks to be
written to ensure the table name is compatible with SQL Server when
being read by an ODBC driver.
The package maps R dataframe datatypes to specific SQL Server
datatypes ensuring consistent, predictable data storage. The current max
length of string in an R character column vector is used to select the
size of the SQL Server nvarchar()
database column.
Extra functionality to help non-database specialists
RtoSQLServer
has functions to modify existing tables.
The drop_column()
, rename_column()
,
add_column()
column functions and the
rename_table()
function do not have equivalents in DBI.
This saves users looking up SQL.
The db_table_metadata()
and
show_schema_tables()
functions provide an easy way to get
information about existing database objects. The
execute_sql()
function is an easy way to run custom SQL as
a closed transaction.
System Versioning
RtoSQLServer
allows a user to create a table with MS SQL
Server System
Versioning enabled. To do this using RtoSQLServer
the
write_dataframe_to_db
function is used with the
versioned_table
argument set to TRUE.
System Versioning adds date / timestamp (SQL Server datatype
DATETIME2
) start and end columns to the table. A
<table name>History
table is created in the database
schema, storing the history of previous records following updates and
deletes. This allows users to maintain an archive of their table setup
in R, without needing to know the SQL used to establish System
Versioning as part of a create table
SQL expression.
To drop a system versioned table using
drop_table_from_db
requires extra privileges in the SQL
Server database. You may need to contact a system administrator to drop
these tables for you if you receive an error.