this session

vibes

share the joy of using HPC for statistical computing
share my workflows and practices R/bash/slurm
two strategies for parallel computations

materials

about me

Tomasz Woźniak

senior lecturer at the department of economics
develops new statistical methods for empirical macroeconomic research
Bayesian structural non-linear dynamic system modelling
HPC user since 2008

about me

Tomasz Woźniak

R enthusiast and specialised user for 17 years
C++ coder since 2021
associate editor of the R Journal
author of R packages bsvars and bsvarSIGNs

why I’m here

the first steps

Connect to `spartan` and set up the folder

ssh twozniak@spartan.hpc.unimelb.edu.au             # connect and type password
cd /data/projects/punim0093/                        # choose your dir
mkdir bsvars                                        # create a new dir

Install `R` packages

source ~/.bash-profile                              # this is where the password is stored
sshpass -e ssh twozniak@spartan.hpc.unimelb.edu.au  # connect without typing password
sinteractive                                        # open interactive session
module load R/4.5.0                                 # load R
R                                                   # open R
install.packages("bsvars")                          # install the package
q("no")                                             # quit R
exit                                                # exit interactive session

basic workflow

The `slurm` file: `bsvar1.slurm`

#!/bin/bash

#SBATCH -p sapphire
#SBATCH --time=10:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --mem=8192
#SBATCH --mail-user=twozniak@unimelb.edu.au
#SBATCH --mail-type=ALL
#SBATCH --job-name='bsvar1'

module load R/4.5.0
Rscript bsvar1.R

The `R` file: `bsvar1.R`

set.seed(123)                                       # set seed for reproducibility
library(bsvars)                                     # load the package      

us_fiscal_lsuw |>                                   # use the data
  specify_bsvar$new() |>                            # specify the model
  estimate(S = 100) |>                              # initial estimation
  estimate(S = 1000) ->                             # final estimation
  post                                              # store final estimation output

save(                                               # save the output
  post,                                             # chose post object
  file = paste0("bsvar1.rda")                       # file name
)

basic workflow

The workflow: `bsvars.sh`

# upload files
sshpass -e scp bsvars/bsvar1.* twozniak@spartan.hpc.unimelb.edu.au:/data/projects/punim0093/bsvars/ 

sshpass -e ssh twozniak@spartan.hpc.unimelb.edu.au  # connect 
cd /data/projects/punim0093/bsvars/                 # go to the folder

sbatch bsvar1.slurm                                 # submit the job     
squeue -u twozniak                                  # check the queue

# download files
scp twozniak@spartan.hpc.unimelb.edu.au:/data/projects/punim0093/bsvars/*.rda bsvars/

summary

great for individual tasks
it’s a single-core job
send many many jobs at once’
better double-check your code first

parallel computations using array jobs

The `slurm` file: `bsvars.slurm`

#!/bin/bash

#SBATCH -p sapphire
#SBATCH --time=10:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --mem=8192
#SBATCH --mail-user=twozniak@unimelb.edu.au
#SBATCH --mail-type=ALL
#SBATCH --job-name='bsvars'
#SBATCH --array=1-10

module load R/4.5.0
Rscript bsvars.R ${SLURM_ARRAY_TASK_ID}

parallel computations using array jobs

The `R` file: `bsvars.R`

args      = commandArgs(trailingOnly = TRUE)  # get arguments
iteration = as.integer(args[1])               # first argument is iteration
rm(args)                                      # remove what's redundant

set.seed(123 + iteration)                     # set seed for reproducibility        
rw = apply(                                   # generate random walk data
  matrix(rnorm(500), ncol = 2),
  2,
  cumsum
)

library(bsvars)                               # load bsvars package

rw |>                                         # use rw data
  specify_bsvar$new(                          # specify the model
    stationary = rep(FALSE, 2)                # customise specification
  ) |>
  estimate(S = 100) |>                        # initial estimation
  estimate(S = 1000) ->                       # final estimation
  post                                        # store final estimation output

save(                                         # save the output
  post,                                       # chose post object
  file = paste0("bsvars_", iteration, ".rda") # file name
)

parallel computations using array jobs

The workflow: `bsvars.sh`

sshpass -e scp bsvars/bsvars.* twozniak@spartan.hpc.unimelb.edu.au:/data/projects/punim0093/bsvars/

# working with bsvars on spartan
#################################################
sshpass -e ssh twozniak@spartan.hpc.unimelb.edu.au
cd /data/projects/punim0093/bsvars/

sbatch bsvars.slurm
squeue -u twozniak

# Download files
scp twozniak@spartan.hpc.unimelb.edu.au:/data/projects/punim0093/bsvars/*.rda bsvars/

what’s next

submit your first single-core job
design and run your first array job
attend spartan training sessions
reach out for help
praise spartan!

this session

vibes

materials

about me

about me

why I’m here

the first steps

the first steps

Connect to spartan and set up the folder

Install R packages

basic workflow

basic workflow

The slurm file: bsvar1.slurm

The R file: bsvar1.R

basic workflow

The workflow: bsvars.sh

summary

parallel computations using array jobs

parallel computations using array jobs

The slurm file: bsvars.slurm

parallel computations using array jobs

The R file: bsvars.R

parallel computations using array jobs

The workflow: bsvars.sh

what’s next

Connect to `spartan` and set up the folder

Install `R` packages

The `slurm` file: `bsvar1.slurm`

The `R` file: `bsvar1.R`

The workflow: `bsvars.sh`

The `slurm` file: `bsvars.slurm`

The `R` file: `bsvars.R`

The workflow: `bsvars.sh`