Last time I wrote on using Python/Pandas as an adjunct to loading PostgreSQL tables. In this sequel, I demo how R can be used to collaborate with the database in a similar way.
The strategy adopted remains the same as in Part I: 1) use R-data.table to load a small subset of data into dataframes to determine datatypes; 2) leverage that data.table to create a relational database table in PostgreSQL; 3) generate bulk load sql copy commands along with shell scripts based on meta-data and csv files; 4) execute the shell scripts using a system command to load data with the efficient copy statements.
The R version is presented below. A proof of concept only, there’s no exception/error handling in the code. Hopefully, the ideas presented resonate.
The technology used is Windows 10 with PostgreSQL 10.9, JupyterLab 0.35.4, and R 3.6.0, along with R packages data.table 1.12.2, tidyverse 1.2.1, and RPostgreSQL 0.6-2.
Read the entire blog here