source: https://docs.nersc.gov/
System Specification
System Partition | Processor | Clock Rate | Physical Cores Per Node | Threads/Core | Sockets Per Node | Memory Per Node |
---|---|---|---|---|---|---|
Login | Intel Xeon Processor E5-2698 v3 | 2.3 GHz | 32 | 2 | 2 | 515 GB |
Haswell | Intel Xeon Processor E5-2698 v3 | 2.3 GHz | 32 | 2 | 2 | 128 GB |
KNL | Intel Xeon Phi Processor 7250 | 1.4 GHz | 68 | 4 | 1 | 96 GB (DDR4), 16 GB (MCDRAM) |
Large Memory | AMD EPYC 7302 | 3.0 GHz | 32 | 2 | 2 | 2 TB |
Node Specifications
Login Nodes
- Cori has 12 Login nodes (cori[01-12]) open to public.
- 2 Large Memory Login nodes (cori[22,23]) to submit to bigmem qos. These nodes have 750GB of memory.
- 4 Jupyter nodes (cori[13,14,16,19]]) access via Jupyter
- 2 Workflow nodes (cori[20,21]) - requires approval before access to node
- 1 Compile node (cori17) - requires approval before access to node
- Each node has two sockets, each socket is populated with a 2.3 GHz 16-core Haswell processor.
Accessing Cori
ssh -X <user>@cori.nersc.gov
Password Prompt: Your set password + OTP
To set up OTP: follow these instructions
- You want to follow steps in Creating and Installing a Token.
- I use Google Authenticator app to see my TOTP.
Data Transfer Nodes
- The data transfer nodes are NERSC servers dedicated to performing transfers between NERSC data storage resources such as HPSS and the NERSC Global File System (NGF), and storage resources at other sites.
- Access:
dtn0[1-4].nersc.gov
or via Globus - For smaller files you can use Secure Copy (
scp
) or Secure FTP (sftp
) or rsync to transfer files between two hosts.
Cori SCRATCH
Cori scratch is a Lustre file system designed for high performance temporary storage of large files. It is intended to support large I/O for jobs that are being actively computed on the Cori system.
Usage
The /global/cscratch1
file system should always be referenced using the environment variable $SCRATCH
(which expands to /global/cscratch1/sd/YourUserName
). The scratch file system is available from all nodes and is tuned for high performance.
Quotas
If your $SCRATCH
usage exceeds your quota, you will not be able to submit batch jobs until you reduce your usage. The batch job submit filter checks the usage of /global/cscratch1
.
Note that the quota on the Community File System and on Global Common is shared among all members of the project, so showquota
/cfsquota
will report the aggregate project usage and quota.
File system | Space | Inodes | Purge time | Consequence for Exceeding Quota |
---|---|---|---|---|
Community | 20 TB | 20 M | - | No new data can be written |
Global HOME | 40 GB | 1 M | - | No new data can be written |
Global common | 10 GB | 1 M | - | No new data can be written |
Cori SCRATCH | 20 TB | 10 M | 12 weeks | Can’t submit batch jobs |
Slurm (Running Jobs)
NERSC uses Slurm for cluster/resource management and job scheduling. Slurm is responsible for allocating resources to users, providing a framework for starting, executing and monitoring work on allocated resources and scheduling work for future execution.
Additional Resources
- Documentation: https://slurm.schedmd.com/documentation.html
- Tutorial: https://slurm.schedmd.com/tutorials.html
- Manual: https://slurm.schedmd.com/man_index.html
- FAQ: https://slurm.schedmd.com/faq.html
Submitting jobs
sbatch
sbatch
is used to submit a job script for later execution. The script will typically contain one or more srun
commands to launch parallel tasks.
When you submit the job, Slurm responds with the job’s ID, which will be used to identify this job in reports from Slurm.
$ sbatch first-job.sh
Submitted batch job 864933
salloc
salloc
is used to allocate resources for a job in real time as an interactive batch job. Typically this is used to allocate resources and spawn a shell. The shell is then used to execute srun
commands to launch parallel tasks.
srun
srun
is used to submit a job for execution or initiate job steps in real time. A job can contain multiple job steps executing sequentially or in parallel on independent or shared resources within the job’s node allocation. This command is typically executed within a script which is submitted with sbatch
or from an interactive prompt on a compute node obtained via salloc
.
How to write sbatch
? Link
How to request for interactive node
? Link
Sample scripts (Bharat)
Public Webpage
path : cd /project/projectdirs/m2467/www/bharat/
webpage : https://portal.nersc.gov/project/m2467/
If you are not able to see your files just run the following command on the terminal:
chmod 755 *
Submitting a parallel job
#!/bin/bash -l
#SBATCH --qos=premium
#SBATCH --nodes=6
#SBATCH --ntasks-per-node=32
#SBATCH --time=02:00:00
#SBATCH --license=SCRATCH #note: specify license need for the file systems your job needs, such as SCRATCH,project
#SBATCH --job-name='calculation of anomalies of GPP for CESM2 First run'
#SBATCH --account=m2467
#SBATCH --output=o.log
#SBATCH --error=e.log
#SBATCH --mail-user dk28nov@gmail.com
#SBATCH -C haswell
echo 'Hello world!'
srun -n 192 --mpi=pmi2 python calc_anomalies_ssa_mpi.py -src 0 -var pr
asking an interactive node
salloc -N 1 -C haswell -q interactive -t 04:00:00
To avoid HDF errors, type the following on the terminal of interactive node:
export HDF5_USE_FILE_LOCKING=FALSE
SCRATCH
$SCRATCH
or cd /global/cscratch1/sd/bharat
CMIP6 Data
/global/cfs/cdirs/m3522/cmip6/CMIP6
Project Community File System (CFS), use this to share data with other members
/global/cfs/cdirs/m2467
Backups
Snapshots
Global homes and Community use a snapshot capability to provide users a seven-day history of their directories. Every directory and sub-directory in global homes contains a “.snapshots” entry.
.snapshots
is invisible tols
,ls -a
, find and similar commands- Contents are visible through
ls -F .snapshots
- Can be browsed normally after
cd .snapshots
- Files cannot be created, deleted or edited in snapshots
- Files can only be copied out of a snapshot
JupyterHub
JupyterHub provides a multi-user hub for spawning, managing, and proxying multiple instances of single-user Jupyter notebook servers. At NERSC, you authenticate to the JupyterHub instance we manage using your NERSC credentials and one-time password. Here is a link to NERSC’s JupyterHub service: https://jupyter.nersc.gov/