
multiplexWormhole wrapper functionmultiplex wormhole is largely an automation of pre-existing protocols for noninvasive SNP panel development (Eriksson et al. 2020). All default parameters for primer design, PCR parameters, and dimer prediction are based on this protocol and assume that the two-step PCR with Illumina Nextera adapters is being used. We recommend that the indexing step use Illumina unique dual indexes to minimize error rates due to index hopping, which is likely to be especially problematic with low-quality samples.
The heart of multiplex wormhole’s contribution to multiplex PCR primer design is its optimization algorithms. The multiplex wormhole optimize_multiplex.py function uses a combination of greedy, simple iterative improvement, and simulated annealing algorithms to minimize dimer load in the multiplex primer set. Dimer load can be measured by either: a) tallying pairwise dimers, or b) maximizing mean Gibbs free energy (deltaG; a meeasure of dimer strength) of pairwise dimers. The optimization process heavily relies on having abundant candidates relative to the number of target loci, and more templates needed for designing larger panels. See Optimization Details for a full descriptions of the optimization algorithms and parameter settings.
multiplex_wormhole was built and tested on MacOS with Python v3.9.13 in the Spyder IDE managed under Anaconda-Navigator. For those new to Python or with existing python packages, Anaconda is the recommended virtual environment manager. For a conda virtual environment within your working directory:
conda create -n py39 python=3.9 #create new virtual env w/ python v3.9
conda activate py39 #enter virtual env
Some clusters used pixi instead of conda environments:
pixi init #initialize virtual env
pixi add "python=3.9" #set python version
pixi shell #enter virtual env
pip install multiplex_wormhole
Note: Pixi/conda can be finicky… Dependening on your system, you may run into dependency errors here. If that happens, exit your virtual env and install the missing dependencies following the instructions below.
If pip install doesn’t work, you can also install manually by taking the following steps (from the command line):
pixi add primer3-py
pixi add pandas==1.4.4
pixi add numpy==1.24.4
pixi add matplotlib==3.5.2
Or, if not using a virtual environment:
pip install primer3-py==2.0.0
pip install pandas==1.4.4
pip install numpy==1.24.4
pip install matplotlib==3.5.2
git clone https://github.com/mhallerud/multiplex_wormhole/
MFEprimer is used for dimer calculations. Multiplex wormhole is set up to automatically download and configure the binary file using the helpers/setup_mfeprimer.py script, take the following steps: Download the MFEprimer v3.2.7 release that fits your operating system here. Save the file to your multiplex_wormhole package directory (location can be found by running pip show multiplex_wormhole). If you cloned the repository directly from GitHub, save MFEprimer to multiplex_wormhole/src/multiplex_wormhole. Unzip the download (if zipped). Ensure the file can be executed by opening terminal or the command line in this directory and running chmod +x mfeprimer*.
Now you are ready to run multiplex wormhole!
The primary input to multiplex wormhole is a CSV table containing information on template DNA sequences and target regions. multiplex wormhole was developed for SNP-based genotyping with data originating from new or previously published genomic sequencing data or pre-existing SNP panels, but the method can be easily adjusted to accommodate multiplex PCR design for other targeted sequencing applications such as microsatellites, methylation regions, or functional regions.
Each DNA sequence in the CSV will be treated as a unique target for PCR amplification (i.e., sequences should be non-overlapping and unique). The file should include these three columns (exact capitalized field names required):
| SEQUENCE_ID | SEQUENCE_TEMPLATE | SEQUENCE_TARGET |
|---|---|---|
| CLocus_704 | TCAGAGAC… | 53,1 |
| … | … | … |
The create_in_templates R script can be used to create this CSV from VCF and FASTA inputs. R dependencies include vcfR for VCF handling and openPrimeR for defining primer binding regions. The R script handles two input types:
de novo: Matches formatting output of de novo Stacks pipeline with FASTA created by populations --fasta-loci. Assumes the VCF ‘CHROM’ field matches the FASTA sequence headers with a set prefix such as “CLocus_”.
*reference-aligned: Assumes the FASTA sequence names are in the format CHROM:startBP-endBP where CHROM matches the reference sequence denoted as ‘CHROM’ field in the VCF, startBP is the position in the reference sequence where the FASTA sequence starts, and endBP is the position in the reference sequence where the FASTA sequence ends. SNPs will then be located within the FASTA sequences based on whether their position (POS in the VCF) is between startBP and endBP. This format can be extracted from reference-aligned SNP data in VCF format (e.g., output from Stacks, bcftools mpileup, etc.) using the following commands with REF.FASTA as the reference genome:
# NOTE: Flanking regions of 100-bp are used here as this is roughly the amplicon length targeted and provides good primer binding space on either side of the target SNP, but this can be adjusted if longer or shorter amplicons are desired.
# thin to 1 SNP per 100-bp to avoid overlapping template sequences
# this step is not necessary if linkage disequilibrium pruning has already been performed on the input VCF
vcftools --vcf <IN.VCF> --thin 100 --out Thinned100bp --recode
# convert SNPs into bed format
bcftools query -f '%CHROM\t%POS\n' Thinned100bp.recode.vcf | awk '{print $1"\t"$2-1"\t"$2}' | awk '{print $0"\t"$1":"$3}' > Thinned100bp.bed
# make tab-delimited file with reference sequence IDs in the first field and sequence lengths in the second field
seqkit fx2tab --length --name <REF.FASTA> chr_lengths.txt
# grab 100-bp flanking regions around thinned SNPs
bedtools slop -b 100 -i Thinned100bp.bed -g chr_lengths.txt > Thinned100bp_Flanking100bp.bed
# make fasta of these
bedtools getfasta -fi <REF.FASTA> -bed Thinned100bp_Flanking100bp.bed -fo <OUT.FA>
Users can also add a “keeplist” FASTA file of primers which must be included in the final multiplex, for example primers from a pre-existing panel or representing functionally important regions (e.g., sexing loci). This option can be invoked for augmenting existing panels. Importantly, primer design settings in the batch primer design step (details below) should be adjusted to match the PCR settings used to design the keeplist primers. Primer sequences and template names matching those in the KEEPLIST fasta will be removed from the TEMPLATES to avoid duplication, so make sure that sequence names match if there is any overlap between the two files. KEEPLIST primer names should follow the format
Multiplex wormhole treats all candidates equally, it is therefore the responsibility of the user to select informative candidate DNA sequences based on their objectives.
For example, I use the following steps to prepare input data for applications focused on individual identification:
populations --fasta-loci for de novo RADseq data and bedtools getfasta for reference-aligned RADseq or WGS data. For reference-aligned data, masking repetitive regions (e.g., using RepeatMasker) is recommended prior to this step.The multiplexWormhole function is a wrapper around all sub-modules and will optimize a panel using the defaults for all other functions. Check out the full workflow and settings within individual functions for ultimate flexibility.m
# load module
import multiplex_wormhole as mw
# panel design (defaults shown)
mw.multiplexWormhole(TEMPLATES, N_LOCI, OUTDIR, KEEPLIST_FA, N_RUNS,
ITERATIONS=1000, CYCLES=10, SIMPLE=5000, deltaG=False,
VERBOSE=False)
# panel assessment (defaults shown)
mw.assessPanel(PRIMERS, ALL_DIMERS_dG=-8, END_DIMERS_dG=-4, BAD_DIMERS_dG=-10)
# move to path where scripts live
cd ~/multiplex_wormhole/src/multiplex_wormhole
# panel design
python3.9 multiplexWormhole.py -t TEMPLATES -n NLOCI -o OUTDIR [-p PREFIX] [-k KEEPLIST] [-r RUNS] [-i ITER] [-c CYCLES] [-s SIMPLE] [-d] [-v]
# panel assessment
python3.9 panel_assessment.py -i PRIMERFASTA/PRIMERCSV [-a ALL_DIMERS_DG] [-e END_DIMERS_DG] [-b BAD_DIMERS_DG]
TEMPLATES (-t) : Path to templates CSV. NLOCI (-n) : Final panel size (i.e., # primer pairs & # templates amplified). OUTDIR (-o) : Filepath where output directory will be created and all outputs saved within a generated folder structure. PREFIX (-p) : Prefix for all outputs. [Defaults to a timestamp if None provided] KEEPLIST_FA (-k) : Path to keeplist FASTA. [Default: None] N_RUNS (-r) : Number of optimization runs. [Default: 10] ITERATIONS (-i) : Number of simulated annealing iterations per cycle. [Default: 1000] CYCLES (-c) ** : Number of simulated annealing cycles per run. [Default: 10] **SIMPLE (-s) : Number of simple iterative improvement iterations per run. [Default: 5000] deltaG (-d) : Optimize for mean overall deltaG of dimers [True] or total dimer tally [False]? [Default: False] VERBOSE (-v) : Print all steps and swaps at the optimization step. [Default: False]
PRIMERS (-i) : FASTA or CSV of primers. Sequence names must match the format
For maximum flexibility, the full multiplex wormhole workflow is available at: multiplex_primer_design.py. Click on the functions below for detailed information on inputs, outputs, and settings (including defaults).
Batch Primer Design with batch_primer3_design.py
Dimer Prediction with MFEprimer dimer
Tabulate Dimers with tabulate_dimers.py
(Optional): Explore Optimization Parameters with plot_ASA_temps.py
Optimize Multiplex Primers with optimize_multiplex.py
Multiple Run Optimization with multiple_run_optimization.py
Panel Assessment with panel_assessment.py
Specificity Checks with primerTree_specificity_checks.R & offtarget_thermodynamics.py
Helper Functions:
CSVtoFASTA.pyadd_keeplist_to_fasta.py
Diagram for multiplex wormhole workflow, with black arrows showing the panel design workflow and blue arrows showing the panel assessment workflow.
This workflow will optimize a novel panel from data including the full template DNA sequences (i.e., target + flanking regions).
TEMPLATES –> primer3_batch_design.py –> MFEprimer dimer –> tabulate_dimers.py –> multiple_run_optimization.py
This workflow will optimally expand an existing multiplex primer set (KEEPLIST) using new DNA template data (TEMPLATES).
TEMPLATES + KEEPLIST –> primer3_batch_design.py –> MFEprimer dimer –> tabulate_dimers.py –> multiple_run_optimization.py
This workflow assesses dimer load of an existing multiplex primer set (PRMERS).
PRIMERS –> panel_assessment.py
An existing panel can be redesigned, or multiple existing panels combined, but the approach will depend on whether only primer sequences available or the full templates are available. If template data are available, ensure that there is one template per target and use the “novel panel” or “augmented panel” (if unequal data availability) workflows. If only primer sequences are available, the only approach for optimization is to reduce panel size. To do so, use the following workflow with the original PRIMERS as the input fasta at the optimization step and set N_LOCI smaller than the current panel:
PRIMERS –> MFEprimer dimer –> tabulate_dimers.py –> multiple_run_optimization.py
Multiplex wormhole is an in silico design tool, and although it facilitates the primer design process, additional checks and lab optimization remain critical to panel success.
Dimer prediction is an imperfect science, and primers should be double-checked for particularly bad dimers e.g. by rerunning with MFEprimer v3 and/or other software such as MFEprimer v4, primer-dimer.com, and/or ThermoFisher’s Multiple Primer Analyzer.
We recommend the protocol in Eriksson et al. (2020) for lab testing:
Problems/bugs, questions, or ideas for enhancement can be directed to the GitHub Issues page. You can also directly contact Maggie Hallerud (hallerum@oregonstate.edu).
Multiplex wormhole has been used to develop or improve noninvasive genotyping panels for:

Contact us if you want to add your species to the list!
Eriksson, CE, Ruprecht J, Levi T. 2020. More affordable and effective noninvasive SNP genotyping using high-throughput amplicon sequencing. Molecular Ecology Resources 20(6): 1505:1516. doi: 10.1111/1755-0998.13208.
O’Leary, SJ, JB Puritz, SC Willis, CM Hollenbeck, and DS Portnoy. 2019. These aren’t the loci you’re looking for: Principles of effective SNP filtering for molecular ecologists. Molecular Ecology Resources 27(16): 3193-3206. doi: 10.1111/mec.14792.
Untergasser, A, Cutcutache, I, Koressaar, T, Ye, J, Faircloth, BC, Remm, M, Rozen SG. 2012. Primer3–new capabilities and interfaces. Nucleic Acids Research: e115. doi: 10.1093/nar/gks596.
Wang, K, H Li, Y Xu, Q Shao, J Yi, R Wang, W Cai, X Hang, C Zhang, H Cai, and W Qu. 2019. MFEprimer-3.0: quality control for PCR primers. Nucleic Acids Research 47 (W1): W610-W613. doi: 10.1093/nar/gkz351.