Skip to contents

RtoSQLServer is used to load R data frames into MS SQL Server and to modify existing MS SQL Server tables using R. This page outlines some of the main functions.

Preliminary setup

A test data frame is created for these examples.

library(RtoSQLServer)

# Make a test dataframe with n rows
test_n_rows <- 123456
test_df <- data.frame(
  colour_cat = sample(c("red", "amber", "green"), test_n_rows, replace = TRUE),
  val = runif(test_n_rows)
)

RtoSQLServer functions create a connection to the database and disconnect it at the end of that function’s process. Each function requires server, database arguments and often a schema argument too.

# Set database connection details for use in functions:
server <- "server\\instance"
database <- "my_database_name"
schema <- "my_schema_name"

Write a data frame to database

We can load the test_df data frame into the MS SQL Server database using the write_dataframe_to_db() function. Here we have set a 10K batch size so the source R data frame is loaded into the staging table in batches of this size. In this example, we do not include arguments append_to_existing or versioned_table as we want to use the default of FALSE for each.

See Table loading method for a detailed description of how a data frame is loaded into a table using RtoSQLServer.

# load data frame into MS SQL Server database
write_dataframe_to_db(
  server = server,
  database = database,
  schema = schema,
  table_name = "test_r_tbl",
  dataframe = test_df,
  batch_size = 1e4,
)

Get table metadata

Once the table is loaded we can check its columns, its column data types and get some summary info using db_table_metadata().

# get table metadata including summary stats (getting the extra stats is
# slower for large tables)
db_table_metadata(
  server = server,
  database = database,
  schema = schema,
  table_name = "test_r_tbl",
  summary_stats = TRUE
)

Read database table into R

We can read a database table back into an R data frame using read_table_from_db().

It is possible to read a subset of table columns by using a character vector of column names in the columns argument. A filter can also be specified for filter_stmt to read a subset of rows. The filter should be written in R, not SQL syntax. See the column and row filtering article for more information.

# read all rows and columns from test_r_tbl into data frame
db_df <- read_table_from_db(
  server = server,
  database = database,
  schema = schema,
  table_name = "test_r_tbl",
)

Append rows to existing table

Eventually we might have more rows to load into the database table. We can do this using write_dataframe_to_db() specifying append_to_existing = TRUE.

# Make another test dataframe to load with append option
# An extra column is added to this data frame to demo functionality
test_n_rows <- 1234
test_df2 <- data.frame(
  colour_cat = sample(c("blue", "purple", "pink"), test_n_rows, replace = TRUE),
  val = runif(test_n_rows),
  extra_col = "a"
)

In this case the test_df2 data frame has an extra column. When this is loaded with the append_to_existing = TRUE option the extra column will be added to the existing table using add_column() and populated with the source data frame values. Of course, the existing records in the database table will be NULL for this new column.

write_dataframe_to_db(
  server = server,
  database = database,
  schema = schema,
  table_name = "test_r_tbl",
  dataframe = test_df2,
  batch_size = 1e3,
  append_to_existing = TRUE
)

Drop table column

If we decide we do not want the newly added column we can use drop_column() to delete it from the database table.

drop_column(
  server = server,
  database = database,
  schema = schema,
  table_name = "test_r_tbl",
  column_name = "extra_col"
)

Delete table rows

We can also delete a subset of rows from our database table using delete_table_rows() and specifying a filter for the filter_stmt argument. The syntax for the filter is the same R format as for read_table_from_db().

Be careful when deleting rows that your filter is correct and not going to delete more rows than you wish. Testing it as a filter_stmt argument to read_table_from_db() first can be a good idea.

delete_table_rows(
  server = server,
  database = database,
  schema = schema,
  table_name = "test_r_tbl",
  filter_stmt = "colour_cat == 'blue' & val > 0.5"
)

Drop a table

To delete the table completely from the database schema, use drop_table_from_db().

drop_table_from_db(
  server = server,
  database = database,
  schema = schema,
  table_name = "test_r_tbl"
)

Show tables (and views) in a schema

If want to see what tables (and views) remain in the schema use show_schema_tables().

show_schema_tables(
  server = server,
  database = database,
  schema = schema,
  include_views = TRUE
)

Views

Views are stored queries in the database that appear as tables and can be read with read_table_from_db(). They can be useful if you regularly need to join the same tables together or apply the same column selections or row filters on the data you wish to read into R. You can write a CREATE VIEW SQL statement and use execute_sql() in R to run the SQL to create a view. A create view function might be added to RtoSQLServer in future.