MCNP Guide
Parallel Computing in MCNP
Running MCNP faster with multiple processors
Why Use Parallel MCNP?
Monte Carlo simulations are naturally parallel. Each particle history is independent, making MCNP ideal for parallel computing.
Benefits
- Reduce runtime dramatically
- Run more particles for better statistics
- Handle larger, more complex models
- Utilize modern multi-core hardware
Key Concepts
- Speedup: How much faster with more cores
- Efficiency: How well cores are utilized
- Scaling: Performance vs. processor count
- Overhead: Cost of coordination
Parallel Methods
MCNP offers three main approaches to parallel computing, each suited for different hardware.
MPI (Message Passing Interface)
Best for clusters and distributed systems. Each process has its own memory.
bash
# Run with 16 MPI processes
mcnp6 i=input n=output tasks 16
# Using mpirun (alternative)
mpirun -np 16 mcnp6.mpi i=input n=outputOpenMP (Shared Memory)
Best for multi-core workstations. Multiple threads share memory.
bash
# Run with 8 OpenMP threads
mcnp6 i=input n=output threads 8
# Set thread affinity for better performance
export OMP_PROC_BIND=true
mcnp6 i=input n=output threads 8Hybrid (MPI + OpenMP)
Combines both methods. Best for clusters with multi-core nodes.
bash
# 4 MPI processes with 6 threads each (24 cores total)
mcnp6 i=input n=output tasks 4 threads 6
# Alternative with environment variable
export OMP_NUM_THREADS=6
mpirun -np 4 mcnp6.mpi i=input n=outputChoosing the Right Method
Single Workstation
- Use OpenMP (threads) for simplicity
- Set threads = number of physical cores
- Avoid hyperthreading for MCNP
- Ensure adequate memory per thread
HPC Cluster
- Use hybrid MPI+OpenMP
- MPI tasks = number of nodes
- Threads = cores per node
- Consider network performance
Performance Tips
Optimal Configuration
- Start with threads = physical cores for single machines
- For clusters, use 1 MPI task per node with threads = cores per node
- Always test different configurations for your specific problem
- Monitor memory usage - each process needs sufficient RAM
Common Mistakes to Avoid
- Over-subscription: More tasks than cores
- Memory issues: Insufficient RAM per process
- I/O bottlenecks: Too many output files
- Poor load balancing: Uneven work distribution
Testing Performance
Always test your parallel setup with a representative problem before running production calculations.
Simple Scaling Test
bash
# Test with different processor counts
mcnp6 i=test n=out1 threads 1 # Baseline
mcnp6 i=test n=out2 threads 2 # 2 cores
mcnp6 i=test n=out4 threads 4 # 4 cores
mcnp6 i=test n=out8 threads 8 # 8 cores
# Compare runtimes and calculate speedup
# Speedup = Time(1 core) / Time(N cores)
# Efficiency = Speedup / N coresTypical Performance Expectations
- Good speedup up to 8-16 cores on most problems
- Diminishing returns beyond 32-64 cores
- Complex geometries may scale better than simple ones
- Shielding problems often scale well due to variance reduction
Getting Started
- Start with OpenMP on your workstation using threads = cores
- Test with a small problem to verify setup works
- Monitor performance and memory usage
- Scale up gradually, testing at each step
- Document what works best for your problem types