Ensuring primer specificity is an important and oft-neglected step towards successful and accurate genotyping of low-quality samples. This is particularly important when genotyping predator scats which can contain DNA from closely related species. Multiplex wormhole does not include a specificity check within the Python package, however, I’m providing an R script for post-processing multiplex wormhole outputs that cross-walks with the PrimerTree R package, which provides automated calling of PRIMER-BLAST.
The PrimerTree_specificity_checks.R script provides a cross-walk between multiplex wormhole outputs and the PrimerTree R package (Cannon et al. 2016). PrimerTree enables automated calls to NCBI’s PRIMER-BLAST, which runs BLAST local alignments of primer pairs against specified taxonomic groups. PRIMER-BLAST uses modified BLAST settings to return results associated with lower matches that may still be amplified within PCR, and additionally applies a global Needleman-Wunsch alignment to ensure specificity, filters returned sequences dynamically based on the position of mismatches within the primer-binding regions, and filters sequnces for amplicon size (Untergasser et al. 2012). The provided R script: 1) automates running PRIMER-BLAST on each primer pair for a list of target taxa and outputs a CSV table, 2) extracts amplicon sequences and primer information for the final multiplex from multiplex wormhole inputs/intermediates, and 3) combines outputs from 1 and 2 via a global alignment, and plots a phylogenetic reconstruction including both the target amplicon sequences and the off-target sequences for each primer pair, with the number of basepair mismatches overlaid (Example below). An additional Python sub-module, offtarget_thermodynamics.py, calculates deltaG and Tm between each primer and off-target primer-binding site predicted by Primer-BLAST. DeltaG from these results can be used to filter off-target sequences included in plotting.
For primerTree specificity checks:
Within R:
load("~/multiplex_wormhole/primerTree_specificity_checks.R")
runPrimerTree (in R)This function loops through each primer pair and organism combination and runs PRIMER-BLAST using primerTree::primer_search, merging results into a single dataframe. Taxonomy and sequences are then pulled from NCBI for each record using the primerTree::get_taxonomy and primerTree::get_sequence commands. The output is a dataframe containing the query primer pair (FWD & REV) and full PRIMER-BLAST outputs. Details on full functionality of primerTree.
library(primerTree)
library(plyr)
library(dplyr)
primerblast <- runPrimerTree(primers, organisms=c(),
fwd_adapter="tcgtcggcagcgtcagatgtgtataagagacag",
rev_adapter="gtctcgtgggctcggagatgtgtataagagacag",
all_combos=FALSE,
primer_specificity_database="core_nt",
exclude_env="False",
MAX_TARGET_SIZE=1000 #max amplicon size
...)
# check output
View(primerblast)
primers : CSV output by multiplex wormhole optimization step, e.g., {OUTDIR}/3_OptimizedMultiplexes/Final_Primers/{OUTNAME}_Run01_primers.csv
organisms : Vector or list of organisms to search using GenBank taxid names. For example, to check specificity of marten primers within martens, co-occurring carnivores, diet items, and possible scat/lab contaminants, this list could include: c(“Mustelidae”, “Carnivora”, “Rodents and rabbits”, “Aves”, “Plethodontidae”, “Ericaceae”, “Bacteria”, “Homo sapiens”). Broader groups are preferred to specific diet items because many species are unlikely to have full genomes represented within GenBank. These names can be explored using the “Add Organism” button on the PRIMER-BLAST website.
all_combos : PRIMER-BLAST all FWD/REV combinations (NOTE: This will take awhile!).
primer_specificity_database : Search database (options=”core_nt”, “nt”, “PRIMERDB/genome_selected_species”, “refseq_mrna”, “refseq_representative_genomes”, “Custom”). If “Custom”, provide sequences in FASTA format to the “CUSTOM_DB” option.
exclude_env : Exclude NCBI sequences from environmental or uncultured samples?
MAX_TARGET_SIZE : Max amplicon size to return from PRIMER-BLAST.
… : Additional arguments that will be passed to primerTree::primer_search. These may include: num_aligns=500: # alignments to keep, .parallel=c(True,False) to enable parallel processing with foreach, .progress=c("none", "tk", "text") to enable a progress bar, or the full scope of PRIMER-BLAST options. If >500 returns are provided for a given taxonomic group + primer pair combination, rerun with increased num_aligns argument.
This function uses primer3-py to calculate thermodynamic stability of full binding sites and end stability of binding sites. Binding sites are identified from PrimerTree/PRIMER-BLAST outputs.
mw.offtargetThermodynamics(INFILE, OUTFILE, ANNEAL_TEMP=52.0, MV_CONC=50, DV_CONC=3.8, DNTP_CONC=0.25, DNA_CONC=50)
INFILE (-i) : CSV containing PRIMER-BLAST outputs. Must contain ‘forward_start’, ‘forward_stop’, ‘reverse_start’, ‘reverse_stop’, ‘Sequence’, ‘FWDseq’, and ‘REVseq’ fields.
OUTFILE (-o) : Filepath where CSV results will save.
ANNEAL_TEMP (-t) : PCR annealing temp (Celsius). Default: 52
MV_CONC (-m) : Monovalent salt concentration in PCR. Default: 50
DV_CONC (-d) : Divalent salt concentration in PCR. Default: 3.8
DNTP_CONC (-p) : Primer concentration in PCR. Default: 0.25
DNA_CONC (-c) : DNA concentration in PCR. Default: 50
extractTemplateInfoThis function extracts the amplicon sequence and primer information for an optimized multiplex based on the input, intermediate, and output files from multiplex wormhole. The output is a dataframe containing the primer information, template sequences and targets, and amplicon sequences and targets (i.e., the shifted target position relative to the full template).
primerinfo <- extractPrimerInfo(templates, filtprimers, finalprimers,
fwd_adapter="tcgtcggcagcgtcagatgtgtataagagacag", rev_adapter="gtctcgtgggctcggagatgtgtataagagacag")
View(primerinfo)
templates : Input CSV containing template DNA sequences and targets input to multiplex wormhole (e.g, 0_Inputs/*Templates.csv)
filtprimers : CSV output by the primer3BatchDesign of multiplex wormhole (e.g., 1_PrimerDesign/FilteredPrimers.csv)
finalprimers : CSV of primers in optimized multiplex (e.g., 3_Optimized_Multiplexes/Final_Primers/{OUTNAME)_Run01_primers.csv)
fwd_adapter & rev_adapter : FWD & REV primer adapter sequences (5’–>3’), default is Illumina Nextera i5 & i7.
This function uses the DECIPHER package’s maximum-likelihood trees with ancestral state reconstruction to visualize the number of mismatches between the target amplicon and sequences produced by in silico off-target amplification. For each primer pair, the target amplicon is aligned to off-target sequences with DECIPHER::DECIPHER::AlignSeqs, then a maximum likelihood dendrogram is constructed with DECIPHER::TreeLine(method=”ML”, reconstruct=True). TreeLine infers ancestral sequences for each node in the tree, then the number of mismatches can be calculated at each split. These state transitions are plotted at each node of the dendrogram and represent the number of mismatches between sequences or clusters. A sequence tree plot is made for each primer pair. Off-target sequences can be optionally filtered by delta G values from mw.offtargetThermodynamics results. To be conservative, the default is to plot all off-target sequences with deltaG<0 of the full binding site.
library(DECIPHER)
library(Biostrings)
# save to a PDF since this will be a bunch of plots
pdf("PRIMERBLAST_Trees.pdf") #open PDF
plotPrimerBlast(primerblast, primerinfo, species="TARGET", dG=0, dG_end=NA)
dev.off() #close PDF
primerblast : Table output by runPrimerTree. May use other BLAST-like specificity check results as long as the table contains “FWD”, “REV”, “Sequence”, “accession”, and “species” fields, with sequences trimmed at the primer binding sites.
primerinfo : Table output by extractPrimerInfo. May use other inputs as longs as they contain “LocusID” and “AmpliconSeq” fields.
species : Prefix for amplicon sequences in plot. Recommended to fully capitalize to facilitate identification of target sequences within plots.
dG : Upper DeltaG threshold to include primer-binding sites. Both the FWD and REV primer-binding site must meet this threshold for the sequence to be plotted. Use ‘NA’ to skip, or if thermodynamics haven’t been calculated. (Default: 0)
dG_end : Upper DeltaG threshold for 3’ ends to include primer-binding sites. Both the FWD and REV primer-binding sites must meet this threshold for the sequence to be plotted. (Default: NA)
This example shows predicted off-target amplification of Mustela, with a 51-bp difference in the Mustela sequence compared to the target Martes caurina amplicon.

Cannon, M, j Hester, A Shalkhauser, ER Chan, K Logue, ST Small, D Serra. 2016. In silico assessment of primers for eDNA studies using PrimerTree and application to characterize the biodiversity surrounding the Cuyahoga River. Scientific Reports 6: 22908. doi:10.1038/srep22908
Pagès, H, P Aboyoun, R Gentleman, S DebRoy 2024. Biostrings: Efficient manipulation of biological strings. R package version 2.72.1, https://bioconductor.org/packages/Biostrings
Untergasser, A, I Cutcutache, T Koressaar, J Ye, BC Faircloth, M Remm, and SG Rozen. 2012. Primer3–new capabilities and interfaces. Nucleic Acids Research 40(15): e115. doi: 10.1093/nar/gks596
Wickham, H. 2011. The Split-Apply-Combine Strategy for Data Analysis. Journal of Statistical Software, 40(1), 1-29. https://www.jstatsoft.org/v40/i01/
Wickham, H, R François, L Henry, K Müller, D Vaughan. 2023. dplyr: A Grammar of Data Manipulation. R package version 1.1.4. https://CRAN.R-project.org/package=dplyr
Wright, ES. 2016. Using DECIPHER v2.0 to Analyze Big Biological Sequence Data in R. The R Journal: 8(1): 352-359. doi: 10.32614/RJ-2016-025
This is a list of PRIMER-BLAST options that can be input as arguments into primerTree::primer_search. Defaults shown. To determine which option corresponds to which PRIMER-BLAST webpage setting, you can right-click on the setting on the website and select “Inspect element”, then look for the corresponding “name” and “value”.