Parallel Computing in MCNP

Running MCNP faster with multiple processors

Why Use Parallel MCNP?

Monte Carlo simulations are naturally parallel. Each particle history is independent, making MCNP ideal for parallel computing.

Benefits

  • Reduce runtime dramatically
  • Run more particles for better statistics
  • Handle larger, more complex models
  • Utilize modern multi-core hardware

Key Concepts

  • Speedup: How much faster with more cores
  • Efficiency: How well cores are utilized
  • Scaling: Performance vs. processor count
  • Overhead: Cost of coordination

Parallel Methods

MCNP offers three main approaches to parallel computing, each suited for different hardware.

MPI (Message Passing Interface)

Best for clusters and distributed systems. Each process has its own memory.

bash
# Run with 16 MPI processes
mcnp6 i=input n=output tasks 16

# Using mpirun (alternative)
mpirun -np 16 mcnp6.mpi i=input n=output

OpenMP (Shared Memory)

Best for multi-core workstations. Multiple threads share memory.

bash
# Run with 8 OpenMP threads
mcnp6 i=input n=output threads 8

# Set thread affinity for better performance
export OMP_PROC_BIND=true
mcnp6 i=input n=output threads 8

Hybrid (MPI + OpenMP)

Combines both methods. Best for clusters with multi-core nodes.

bash
# 4 MPI processes with 6 threads each (24 cores total)
mcnp6 i=input n=output tasks 4 threads 6

# Alternative with environment variable
export OMP_NUM_THREADS=6
mpirun -np 4 mcnp6.mpi i=input n=output

Choosing the Right Method

Single Workstation

  • Use OpenMP (threads) for simplicity
  • Set threads = number of physical cores
  • Avoid hyperthreading for MCNP
  • Ensure adequate memory per thread

HPC Cluster

  • Use hybrid MPI+OpenMP
  • MPI tasks = number of nodes
  • Threads = cores per node
  • Consider network performance

Performance Tips

Optimal Configuration

  • Start with threads = physical cores for single machines
  • For clusters, use 1 MPI task per node with threads = cores per node
  • Always test different configurations for your specific problem
  • Monitor memory usage - each process needs sufficient RAM

Common Mistakes to Avoid

  • Over-subscription: More tasks than cores
  • Memory issues: Insufficient RAM per process
  • I/O bottlenecks: Too many output files
  • Poor load balancing: Uneven work distribution

Testing Performance

Always test your parallel setup with a representative problem before running production calculations.

Simple Scaling Test

bash
# Test with different processor counts
mcnp6 i=test n=out1 threads 1    # Baseline
mcnp6 i=test n=out2 threads 2    # 2 cores
mcnp6 i=test n=out4 threads 4    # 4 cores
mcnp6 i=test n=out8 threads 8    # 8 cores

# Compare runtimes and calculate speedup
# Speedup = Time(1 core) / Time(N cores)
# Efficiency = Speedup / N cores

Typical Performance Expectations

  • Good speedup up to 8-16 cores on most problems
  • Diminishing returns beyond 32-64 cores
  • Complex geometries may scale better than simple ones
  • Shielding problems often scale well due to variance reduction

Getting Started

  1. Start with OpenMP on your workstation using threads = cores
  2. Test with a small problem to verify setup works
  3. Monitor performance and memory usage
  4. Scale up gradually, testing at each step
  5. Document what works best for your problem types