This is an quick script for cleaning up the database. For further documentation see the README.rmd file in the main qaqc folder.

Step 1: set up an outdir and a connection to bety The outdir is where temporary editing files will be written, and where a backup of bety will be stored.

con <- RPostgreSQL::dbConnect(RPostgreSQL::PostgreSQL(),
dbname = "bety",
password = 'bety',
host = 'psql-pecan.bu.edu',
user = 'bety')
outdir<-as.character(getwd()) #set as prefered out directory
bety_backup_directory<-as.character(getwd()) #set as prefered directory for bety backup.

options(scipen=999) #To view full id values

Step 2: Back up bety Before any major deletion processes, it makes sense to make a back-up version of the database. Don’t skip this step.

system('TODAY=$( date +"%d" )')
backup_dir<-paste('pg_dump -U bety -d bety | gzip -9 > ', bety_backup_directory,'/bety-pre-culling${TODAY}.sql.gz', sep="")

system(backup_dir)

Step 3: find all of the entries that should be deleted *

formats<-find_formats_without_inputs(con=con, created_before = "2016-01-01", user_id = NULL, updated_after = "2016-01-01") #Ideally, date should be set to the date of the last release 

inputs<-find_inputs_without_formats(con=con, created_before = "2014-01-01",updated_after = NULL, user_id = NULL)

Since just a dump of every column can be hard to read, just choose the columns that are important.

column_names<-get_table_column_names(table = formats, con = con)
column_names$formats

column_names<-column_names$formats[column_names$formats %in% c("id","created_at", "name", "notes", "table_name")]
column_names

** Option 1: Edit an R object ***

This is the most important step! Navigate to the written out table and delete entries that should remain in the database.

formats<-formats[colnames(formats) %in% column_names] #subset for easy viewing
View(formats)

formats<-formats[grep("test",formats$name),]
View(formats)

** Option 2: Edit a file **

This is also the most important step! Navigate to the written out table and delete entries that should remain in the database. If the tables are difficult to read, change what columns are retained by editing the “relevant_table_columns” parameter.

write_out_table(table = formats, outdir = outdir, relevant_table_columns = column_names, table_name = "formats")
write_out_table(table = inputs,outdir = outdir, relevant_table_columns =c("id", "created_at", "name"), table_name = "inputs")

Step 5: Cull all remaining entries in the files

This will delete all entries remaining in the file, and place a delete log in the outdir directory.

please do not run before editing either the table object or the “query_of” file

### Note: Do not run before the editing step ###

## From an R object ##
#cull_database_entries(table=formats, outdir = outdir, con=con, table_name = "formats")

## From a file ##
#cull_database_entries(outdir = outdir,file_name ='query_of_formats', con=con, table_name = "formats")
#cull_database_entries(outdir = outdir,file_name ='query_of_inputs', con=con, table_name = "inputs")