Parallel Computing in MCNP

Running MCNP faster with multiple processors

Why Use Parallel MCNP?

Monte Carlo simulations are naturally parallel. Each particle history is independent, making MCNP ideal for parallel computing.

Benefits

  • Reduce runtime dramatically
  • Run more particles for better statistics
  • Handle larger, more complex models
  • Utilize modern multi-core hardware

Key Concepts

  • Speedup: How much faster with more cores
  • Efficiency: How well cores are utilized
  • Scaling: Performance vs. processor count
  • Overhead: Cost of coordination

Parallel Methods

MCNP offers three main approaches to parallel computing, each suited for different hardware.

MPI (Message Passing Interface)

Best for clusters and distributed systems. Each process has its own memory.

bash
# Run with 16 MPI processes
mcnp6 i=input n=output tasks 16

# Using mpirun (alternative)
mpirun -np 16 mcnp6.mpi i=input n=output

OpenMP (Shared Memory)

Best for multi-core workstations. Multiple threads share memory.

bash
# Run with 8 OpenMP threads
export OMP_NUM_THREADS=8
mcnp6 i=input n=output

# Set thread affinity for better performance
export OMP_PROC_BIND=true
export OMP_NUM_THREADS=8
mcnp6 i=input n=output

Hybrid (MPI + OpenMP)

Combines both methods. Best for clusters with multi-core nodes.

bash
# 4 MPI processes with 6 threads each (24 cores total)
export OMP_NUM_THREADS=6
mpirun -np 4 mcnp6.mpi i=input n=output

Choosing the Right Method

Single Workstation

  • Use OpenMP (threads) for simplicity
  • Set threads = number of physical cores
  • Avoid hyperthreading for MCNP
  • Ensure adequate memory per thread

HPC Cluster

  • Use hybrid MPI+OpenMP
  • MPI tasks = number of nodes
  • Threads = cores per node
  • Consider network performance

Performance Tips

Optimal Configuration

  • Start with threads = physical cores for single machines
  • For clusters, use 1 MPI task per node with threads = cores per node
  • Always test different configurations for your specific problem
  • Monitor memory usage - each process needs sufficient RAM

Common Mistakes to Avoid

  • Over-subscription: More tasks than cores
  • Memory issues: Insufficient RAM per process
  • I/O bottlenecks: Too many output files
  • Poor load balancing: Uneven work distribution

Testing Performance

Always test your parallel setup with a representative problem before running production calculations.

Simple Scaling Test

bash
# Test with different thread counts (set before each run)
OMP_NUM_THREADS=1 mcnp6 i=test n=out1    # Baseline
OMP_NUM_THREADS=2 mcnp6 i=test n=out2    # 2 cores
OMP_NUM_THREADS=4 mcnp6 i=test n=out4    # 4 cores
OMP_NUM_THREADS=8 mcnp6 i=test n=out8    # 8 cores

# Compare runtimes and calculate speedup
# Speedup = Time(1 core) / Time(N cores)
# Efficiency = Speedup / N cores

Typical Performance Expectations

  • Good speedup up to 8-16 cores on most problems
  • Diminishing returns beyond 32-64 cores
  • Complex geometries may scale better than simple ones
  • Shielding problems often scale well due to variance reduction

Getting Started

  1. Start with OpenMP on your workstation using threads = cores
  2. Test with a small problem to verify setup works
  3. Monitor performance and memory usage
  4. Scale up gradually, testing at each step
  5. Document what works best for your problem types