Skip to content

Running First Job

After gaining access to the HPC system at Tribhuvan University, users can submit jobs using the SLURM job scheduler. This guide explains how to run your first job, monitor its status, and optimize job submissions.

1. Logging into the HPC System

Before running jobs, connect to the TU HPC system via SSH:

ssh username@tu-hpc-ip
Ensure you are in your home directory or an appropriate project folder before submitting jobs.

2. Creating a SLURM Job Script

A job script tells the SLURM scheduler how to allocate resources and execute your program. Create a job script (e.g., job.slurm) using a text editor:

nano job.slurm
Add the following SLURM directives to specify job parameters:
#!/bin/bash
#SBATCH --job-name=test_job  # Job name
#SBATCH --output=output.txt  # Output file
#SBATCH --error=error.log    # Error file
#SBATCH --ntasks=1           # Number of tasks
#SBATCH --cpus-per-task=4    # Number of CPU cores per task
#SBATCH --time=01:00:00      # Time limit (hh:mm:ss)
#SBATCH --partition=normal   # Specify partition (e.g., normal, protein, fplo)

module load python/3.10.12   # Load required module
python my_script.py          # Run Python script
Save and exit (CTRL + X, then Y and ENTER).

3. Submitting the Job

Submit the job to the SLURM scheduler using:

sbatch job.slurm
This will add your job to the queue, and it will execute once resources are available.

4. Monitoring Job Status

To check the status of your submitted jobs, use:

squeue -u $USER
To view detailed job information:
scontrol show job <job_id>
To cancel a job:
scancel <job_id>

5. Checking Job Output

Once the job is completed, check the output and error files:

cat output.txt  # View job output
cat error.log   # View error messages (if any)

6. Optimizing Job Submissions

  • Use appropriate resource allocation (CPU, memory, and time) to avoid long queue times.
  • Run test jobs with small datasets before scaling up computations.
  • Use job dependencies if running multiple interdependent tasks:
    sbatch --dependency=afterok:<job_id> job2.slurm
    
  • Consider using parallel computing (MPI, OpenMP) for efficiency.

7. Example: Running a GPU Job

For GPU-based workloads, modify the script as follows:

#!/bin/bash
#SBATCH --job-name=gpu_test
#SBATCH --output=gpu_output.txt
#SBATCH --gres=gpu:1        # Request 1 GPU
#SBATCH --partition=protein # Use GPU-enabled partition

module load cuda/11.2
./gpu_program
Submit it using:

sbatch gpu_job.slurm

Start computing efficiently on TU HPC! 🚀