The RtoSQLServer package relies on DBI and odbc packages for
its database functionality, however you should consider using it instead
of DBI or odbc for the following reasons:
Importing large dataframes
A key benefit of the package is to ensure the data loading process is
more robust than using DBI and odbc packages
alone.
When loading large dataframes from R into MS SQL Server using
DBI it was found that the loading process can hang and
never complete.
People using R to load data into SQL Server may be working at home and have varying quality of connection to the database server.
RtoSQLServer does two things to help with data
loading:
Loading the dataframe into a staging table in the database first of all. When the load into the staging table is complete, the target table is created as a copy of the staging table. This helps to prevent misleading problems of incomplete loads. The staging table is dropped once the target table has been created successfully.
Loading in batches. The
RtoSQLServerpackage’swrite_dataframe_to_dbfunction includes abatch_sizeargument. Batch size is the number of rows of the R dataframe that are loaded into the database staging table at once. Using the batch loading process inRtoSQLServerit has been possible to load large tables it was not possible to load in one usingDBI. The batch size is set for the user at a default of 100K so the user does not need to worry about a suitable batch size or splitting up the source dataframe in R.
Self-contained database transactions
Unlike DBI and odbc where user must open a
database connection object in R and then pass this connection to
functions (before hopefully remembering to close it at the end of the R
session), in RtoSQLServer a database connection is
established when calling each function and closed at the end of that
function. This means there are not hung connections to the database,
waiting until the user closes R.
The RtoSQLServer functions can be used within user
custom functions as self-contained tasks and do not rely on an open
connection input argument. This also means a user does not need to know
how to setup a connection, they simply pass the server and database name
to the function. This can make it easier to run code others have
written.
Repeatedly opening and closing connections might make things slightly slower, however it is thought this will not have much impact on the workflow of the average R user where there will be only a few database transactions in an R session.
Checking of table, column names and datatypes from R to SQL Server
The DBI and odbc packages are designed to
work with many different database engines. The RtoSQLServer
package has focused on Microsoft SQL Server. This allows checks to be
written to ensure the table name is compatible with SQL Server when
being read by an ODBC driver.
The package maps R dataframe datatypes to specific SQL Server
datatypes ensuring consistent, predictable data storage. The current max
length of string in an R character column vector is used to select the
size of the SQL Server nvarchar() database column.
Extra functionality to help non-database specialists
RtoSQLServer has functions to modify existing tables.
The drop_column(), rename_column(),
add_column() column functions and the
rename_table() function do not have equivalents in DBI.
This saves users looking up SQL.
The db_table_metadata() and
show_schema_tables() functions provide an easy way to get
information about existing database objects. The
execute_sql() function is an easy way to run custom SQL as
a closed transaction.
System Versioning
RtoSQLServer allows a user to create a table with MS SQL
Server System
Versioning enabled. To do this using RtoSQLServer the
write_dataframe_to_db function is used with the
versioned_table argument set to TRUE.
System Versioning adds date / timestamp (SQL Server datatype
DATETIME2) start and end columns to the table. A
<table name>History table is created in the database
schema, storing the history of previous records following updates and
deletes. This allows users to maintain an archive of their table setup
in R, without needing to know the SQL used to establish System
Versioning as part of a create table SQL expression.
To drop a system versioned table using
drop_table_from_db requires extra privileges in the SQL
Server database. You may need to contact a system administrator to drop
these tables for you if you receive an error.