this session

vibes

  • share the joy of using HPC for statistical computing
  • share my workflows and practices R/bash/slurm
  • two strategies for parallel computations

materials

about me

Tomasz Woźniak

  • senior lecturer at the department of economics
  • develops new statistical methods for empirical macroeconomic research
  • Bayesian structural non-linear dynamic system modelling
  • HPC user since 2008

about me

Tomasz Woźniak

  • R enthusiast and specialised user for 17 years
  • C++ coder since 2021
  • associate editor of the R Journal
  • author of R packages bsvars and bsvarSIGNs

the first steps

the first steps

Connect to spartan and set up the folder

ssh twozniak@spartan.hpc.unimelb.edu.au             # connect and type password
cd /data/projects/punim0093/                        # choose your dir
mkdir bsvars                                        # create a new dir

Install R packages

source ~/.bash-profile                              # this is where the password is stored
sshpass -e ssh twozniak@spartan.hpc.unimelb.edu.au  # connect without typing password
sinteractive                                        # open interactive session
module load R/4.5.0                                 # load R
R                                                   # open R
install.packages("bsvars")                          # install the package
q("no")                                             # quit R
exit                                                # exit interactive session

basic workflow

basic workflow

The slurm file: bsvar1.slurm

#!/bin/bash

#SBATCH -p cascade
#SBATCH --time=10:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --mem=8192
#SBATCH --mail-user=twozniak@unimelb.edu.au
#SBATCH --mail-type=ALL
#SBATCH --job-name='bsvar1'

module load R/4.5.0
Rscript bsvar1.R

The R file: bsvar1.R

set.seed(123)                                       # set seed for reproducibility
library(bsvars)                                     # load the package      

us_fiscal_lsuw |>                                   # use the data
  specify_bsvar$new() |>                            # specify the model
  estimate(S = 100) |>                              # initial estimation
  estimate(S = 1000) ->                             # final estimation
  post                                              # store final estimation output

save(                                               # save the output
  post,                                             # chose post object
  file = paste0("bsvar1.rda")                       # file name
)

basic workflow

The workflow: bsvars.sh

# upload files
sshpass -e scp bsvars/bsvar1.* twozniak@spartan.hpc.unimelb.edu.au:/data/projects/punim0093/bsvars/ 

sshpass -e ssh twozniak@spartan.hpc.unimelb.edu.au  # connect 
cd /data/projects/punim0093/bsvars/                 # go to the folder

sbatch bsvar1.slurm                                 # submit the job     
squeue -u twozniak                                  # check the queue

# download files
scp twozniak@spartan.hpc.unimelb.edu.au:/data/projects/punim0093/bsvars/*.rda bsvars/

summary

  • great for individual tasks
  • it’s a single-core job
  • send many many jobs at once’
  • better double-check your code first

parallel computations using array jobs

parallel computations using array jobs

The slurm file: bsvars.slurm

#!/bin/bash

#SBATCH -p cascade
#SBATCH --time=10:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --mem=8192
#SBATCH --mail-user=twozniak@unimelb.edu.au
#SBATCH --mail-type=ALL
#SBATCH --job-name='bsvars'
#SBATCH --array=1-10

module load R/4.5.0
Rscript bsvars.R ${SLURM_ARRAY_TASK_ID}

parallel computations using array jobs

The R file: bsvars.R

args      = commandArgs(trailingOnly = TRUE)  # get arguments
iteration = as.integer(args[1])               # first argument is iteration
rm(args)                                      # remove what's redundant

set.seed(123 + iteration)                     # set seed for reproducibility        
rw = apply(                                   # generate random walk data
  matrix(rnorm(500), ncol = 2),
  2,
  cumsum
)

library(bsvars)                               # load bsvars package

rw |>                                         # use rw data
  specify_bsvar$new(                          # specify the model
    stationary = rep(FALSE, 2)                # customise specification
  ) |>
  estimate(S = 100) |>                        # initial estimation
  estimate(S = 1000) ->                       # final estimation
  post                                        # store final estimation output

save(                                         # save the output
  post,                                       # chose post object
  file = paste0("bsvars_", iteration, ".rda") # file name
)

parallel computations using array jobs

The workflow: bsvars.slurm

sshpass -e scp bsvars/bsvars.* twozniak@spartan.hpc.unimelb.edu.au:/data/projects/punim0093/bsvars/

# working with bsvars on spartan
#################################################
sshpass -e ssh twozniak@spartan.hpc.unimelb.edu.au
cd /data/projects/punim0093/bsvars/

sbatch bsvars.slurm
squeue -u twozniak

# Download files
scp twozniak@spartan.hpc.unimelb.edu.au:/data/projects/punim0093/bsvars/*.rda bsvars/

what’s next

  • submit your first single-core job
  • design and run your first array job
  • attend spartan training sessions
  • reach out for help
  • praise spartan!