RNAi: a revolution in functional genomics

The discovery of RNAi coincided with the breakthrough in genome sequencing and triggered a new era of biomedical research. Where in the past, discovery, cloning, sequencing and functional description of a new gene took many years, the availability of annotated genomes and RNAi made all the difference: The functional discovery of genomes became a scalable process.

Early milestone discoveries in RNAi

RNAi was preceded by intensive research on antisense technology, attempting to influence gene expression with relatively short, single-stranded nucleic acids. In 1998, Mello and Fire discovered that double-stranded RNA was an extremely efficient inhibitor of gene expression. Specifically, double-stranded unc22 RNA (dsRNA) was up to 100 fold more efficient in inhibiting unc122 gene expression than the coresponding single strand antisense RNA (Fire et al. Nature 1998). In 2006, only 8 years after the discovery of RNAi, the Nobel Prize was awarded to Mello and Fire.

Working with the nematode C. elegans, Fire and Mello were able to use long double-stranded RNA (dsRNA) molecules. In vertebrate cells, however, these long dsRNAs trigger strong interferon response and cell death. Elbashir and Tuschl demonstrated in 2001 that RNAi is mediated by short double strand RNAs termed small interfering RNAs (siRNAs). siRNAs consist of 21-nucleotide long RNA strands forming a central 19 base duplex and 2-base 3’ overhangs. Endogenous siRNAs are processed by a type III ribonuclease called Dicer, leaving hydroxyl groups at the 3’ ends and phosphate at the 5 ’ends. Chemically synthesized siRNAs show the same activity as natural, dicer processed siRNAs. Unlike long dsRNAs, siRNAs do not trigger interferon response and do not cause sequence-independent cell toxicity.

The mechanism of siRNA mediated target mRNA degradation has been studied in great detail. Following the entry of exogenous siRNAs or dicer processing of cellular microRNAs, siRNA molecules are loaded into the RNAi-Induced Silencing Complex (RISC) , a multi-protein complex containing an Argonaut (AGO) protein as its central component. In this process, one of the siRNA strands (called the guide) binds to AGO in a specific conformation, exposing its bases for the recognition of matching mRNAs. This bound siRNA strand allows RISC to scan the transcriptome, giving sequence specificity to its nuclease activity.

Of the 4 humans AGO proteins, only AGO2 has nuclease activity, cleaving complementary mRNAs across from the 10th nucleotide of the guide siRNA strand. After RISC-mediated cleavage, mRNAs are further degraded by cellular RNA degradation processes.

siRNAs require an almost perfect match of their 19-base duplex to trigger the cleavage of a complementary mRNA. Especially bases 2 to 18 require full target base pairing, with just a single mismatch significantly reducing cleavage efficiency. Because of this required sequence identity, siRNAs can be designed to target a specific gene.


Principally, both strands of the siRNA can be loaded into RISC to guide target mRNA recognition. The efficiency however, by which either strand is loaded into RISC  strongly depends on the thermodynamic asymmetry of  the siRNA ends. Consequently, strand loading can be largely controlled by the GC-content of the siRNA termini.

Robust design rules  for siRNAs with potent target gene silencing have been intensively explored since the discovery of siRNAs. The “asymmetry rule”, giving control over strand loading, is one of the most important design criteria. However, moving beyond isolated observations towards statistically based design rules required a sufficiently large siRNA silencing data set.  A large silencing dataset with > 2000 siRNAs, published in 2005 by Novartis, provided a useful resource for determining sequence features associated with effective silencing.

In spite of the demonstrated high specificity of target mRNA cleavage, early reports using genome-wide expression analysis demonstrated many off-target effects for siRNAs. This finding was particularly enigmatic, as the siRNAs showed no extended sequence homology to their off-target genes.

Both on-target and off-target gene silencing were shown to be dose dependent with on-target silencing saturating at low single digit nanomolar concentrations and dropping off steeply in mid picomolar range. Off-target gene deregulation required somewhat higher concentrations and was largely absent at 100 pM. As an important conclusion, off-target effects can be reduced by performing RNAi-experiments at the lowest possible siRNA concentration. However, siRNA concentration cannot be reduced to levels preventing off-target effects without also losing target gene silening.

The apparent contradiction between high cleavage specificity and abundant off-target effects was resolved by the discovery of the “seed” sequence in siRNAs. Bioinformatics analysis of large-scale siRNA screening datasets indicated that off-target genes did not show extended sequence homology but much rather a match to a 6 or 7 base region in the antisense (guide) strand of the siRNA. Further investigations demonstrated that “seeds” extend from antisense (guide) bases 2 to 7 or 8 and predominantly match in the 3’ UTR of off-target genes.

RNAi was discovered by its highly specific mRNA cleavage or “slicing” activity, requiring almost complete sequence identity to its target mRNA. This pathway presumably evolved as a natural mechanism to suppress transposons and RNA-viruses. The identification of microRNA (miRNAs) pointed to a second, even more fundamental RNAi mechanism.  miRNAs are naturally occuring small dsRNAs of similar length to siRNAs but with frequent mismatches between their RNA strands. The miRNA pathway harnesses the same molecular machinery as the slicing pathway. However, miRNAs only require a match in their seed region to trigger translational inhibition, deadenylation and degradation of their target mRNAs. Using a recognition sequence of only 6 to 7 bases, miRNAs can target multiple mRNAs, serving as natural upstream regulators of gene expression in cell differentiation and development.

In addition to its on-target slicing activity, every siRNA acts as artificial miRNA, repressing a largely unpredictable set of off-target transcripts with a seed match in their 3’ UTR.

With a recognition sequence of 6 to 7 bases, every seed sequence statistically matches thousands of 3’ UTRs. Gene expression studies however indicate, that for most siRNAs, only a small fraction of the potential off-target genes with a seed match in the 3’UTR really show significant off-target silencing. In fact, the number of significantly down-regulated off-target genes varies strongly between different siRNAs, ranging from low single digit to several hundred. Interestingly, the median number of seed-deregulated targets is around 60  for both siRNAs and miRNA, supporting the notion of siRNAs acting as artificial miRNAs.

The ability to accurately predict the number and identity of off-target genes based on the siRNA sequence would be extremely valuable. However to date there is no published algorithm that reliably predicts off-targets for siRNAs.

Off-target effects dominate siRNA screens with conventional siRNA reagents

siRNAs are ideally suited for functional genomics screening in cell culture: The high penetrance of siRNAs gives homogenous phenotypes.  Partial silencing -as opposed to knock-out-  reveals essential genes.  And the ease of handling is compatible with automated high throughput screening. With the human genome sequence becoming available in 2001, it was possible to design large-scale libraries, some covering every coding human gene with multiple siRNAs. Hoping to systematically characterize the human genome for a broad range of biological processes, many large-scale RNAi screens were performed, ignoring the early warnings of serious off-target effects. Many screens produced a suspiciously high numbers of positive siRNAs, and very low phenotypic correlation between multiple siRNAs targeting the same gene. A rigorous analysis of off-target effects in siRNA screens was only published 10 years after the discovery of siRNAs. Screening with a large library having multiple siRNAs per gene, the effects for siRNAs sharing the same seed were far more similar than for siRNAs sharing the same target gene.  This demonstrated that the large majority of effects measured in RNAi screening are indeed seed-based off-target effects.

The plots below depict the correlation or similarity between two phenotypic read-outs between two test conditions. An R value close to 1 indicates high correlation or similarity in readouts while a value close to 0 indicates little to no correlation.

siRNA pools

Good assay reproducibility

Single siRNAs

Poor siRNA specificity leads to varying phenotypes

Same seed siRNAs

Seed-based off-target effects dominate siRNA-induced phenotypes

High correlation (R=0.94) between technical replicates indicates the same siRNA reagent performs reproducibly.

Poor correlation (R=0.073) between siRNA reagents (single siRNAs) that target the same gene indicates two siRNA reagents, despite targeting the same gene, do not produce similar phenotypes.

Good correlation (R=0.53) between siRNAs with the same seed sequence, but targeting different genes, indicates seed sequence has a greater influence on siRNA activity than designed on-target effect.

For most siRNAs, on-target silencing via the “slicing” mechanism is more efficient than seed-based off-target deregulation, which seems to be in inconsistent with the fact that phenotypic effects of siRNAs are predominantly off-target based. However, this contradiction can be explained by the far larger number of off-target genes: Whilst on-target silencing only screens the activity of one single gene, broad off-target gene deregulation drastically increases the chances of hitting a gene critically involved in the observed cellular process. In other words, seed based gene deregulation has the effect of a genetic shot gun.

Consequently, assays monitoring fundamental cell functions involving many gene factors show higher off-target tendency than assays focusing on narrow pathways. As an example, cell proliferation assays yield largely unspecific RNAi results whereas transcription factor reporter assays can be highly specific.

Since the discovery of off-target effects, there have been numerable attempts to eliminate off-target effects with chemical siRNA modifications. A fundamental hurdle in this approach is the fact that both pathways of RNAi use largely the same set of protein factors. For instance, of the 4 Argonaut proteins in humans, only AGO2 has nuclease activity and is capable of on-target slicing. However, all 4 AGO proteins (including AGO2) support the miRNA pathway. The rapid scanning of transcripts by seed interaction is part of the on-target sequence recognition mechanism of siRNAs and as such difficult to selectively inhibit. Consistent with this notion, all commercially available siRNA libraries containing chemical modifications show strong off-target effects, indicating that their modifications are inefficient for off-target suppression.

➡ Read more on this topic in the siTOOLs Biotech Blog. 

Off-target dilution with complex siRNA pools

A straight forward concept to minimize off-target effects and at the same time achieve efficient and robust on-target silencing is the complex pooling of a large number of siRNAs. If in an equimolar, complex siRNA pool each siRNA is selected to have a unique seed sequence, the effective concentration of the seed is reduced to the fraction of the number of siRNAs in the pool. As an example, in a pool of 30 siRNAs, transfected at a 1 nM total siRNA concentration, each siRNA would be present at 33pM, a concentration low enough to eliminate seed effects. In contrast, as all siRNAs share the same targe gene, the effective siRNA concentration for on-target silencing is the sum of all individual siRNAs.

The siPOOL Concept

Using single siRNAs or low complexity pools ( e.g. 4 siRNAs) can lead to multiple, unpredictable off-target genes being down-regulated, resulting in unspecific phenotypes. 
While the use of complex siRNA pools (siPOOL = 30 siRNAs) allows for each siRNAs being present at low concentrations. Effectively diluting off-target effects, resulting in specific phenotypes.

Working concentration

In nanomolar range for siRNA

Low specificity | Unreliable results

In picomolar range for each siRNA

High specificity | Enhanced reliability

Bring in cooperative targeting for maximum knockdown

Target specificity correlates with siRNA pool complexity

As off-target effects were shown to be dose dependent, the specificity of an siRNA pool would be expected to improve with increasing numbers of different siRNAs, provided that each siRNA has a different seed sequence.

Using different experimental approaches such as off-target reporter constructs or RNA sequencing, it is indeed possible to demonstrate a direct correlation between the complexity of siRNA pools and their target specificity.

Even though absolute numbers may vary with experimental settings, it is clear that off-target dilution requires complex pools with dozens of siRNAs. This finding is very consistent with published numbers on siRNA concentrations: With on-target silencing requiring a minimum concentration of 1-3 pM, a pool of 30 different siRNAs would be sufficient to dilute individual siRNAs to 33-100 pM, a concentration at which off-target effects were shown to be drastically reduced.

Specificity improves with complexity. Left: Low-complexity siRNA pooling (e.g. Dharmacon siGENOME SMARTpools) does not prevent siRNA off-targets. It may in fact exacerbate off-target effects. Right: Only high-complexity pooling (siPOOLs) can reliably ensure on-target phenotypes.

Effect of siRNA pool complexity on siRNA off-targeting.

The off-targeting activity of an siRNA was monitored with a luciferase reporter linked to the 3’ UTR of known off-target gene (MAD2). The siRNA was administered together with 3, 14 or 59 other sequence-independent siRNAs to the same target gene (PolG).

Complexity is key - 4 siRNAs may not be enough.

We show here that an siRNA with known off-target activity against MAD2 gene required high complexity pools of > 15 siRNAs to sufficiently reduce off-target effects. Similar results were obtained when off-target activity was assayed by a MAD2 3‘UTR-linked luciferase reporter, MAD2 protien expression and MAD2 functional assay (mitotic escape).

4 siRNAs are not enough

Low complexity pools containing 4 siRNAs are frequently used for large scale RNAi screening but also for focussed gene silencing experiments. Whilst the pooling may help to provide more reliable target gene silencing, 4 siRNAs are clearly insufficient to dilute the individual siRNAs of the pool below the concentration triggering off-target effects: Even at a low, total siRNA concentration of 3nM, each siRNA would be present at almost 1nM, a concentration perfectly capable of trigering off-target effects. Even worse, low complexity pools are likely to combine the off-target effects of 4 siRNA.

Consistent with these calculations and broadly accepted properties of siRNAs, low complexity pools show low target specificity, which for instance becomes obvious in a very low confirmation rate typical for RNAi screens performed with the reagents.

High-complexity pools can suppress dominant off-target effects

Several large-scale siRNA screens reported a large fraction of the top scoring hits being caused by the seed-based deregulation of one single off-target gene. Controlling dominant off-target effects is a critical requirement for effective off-target mitigation. Complex siRNA pools are uniquely suitable to drastically reduce the impact of dominant off-target effects. When placed in a complex siRNA pool, individual siRNAs with massive off-target gene deregulation profiles are efficently diluted, reducing their genetic disturbance to a tolerable level.

siPOOLs reduce off-target effects. Whole transcriptome profiling of HeLa cells treated with 3 nM siRNA or a siPOOL, containing the same siRNA, was carried out after 48 h. siPOOLs efficiently reduced off-target effects (red dots) of siRNA while maintaining on-target (green dot) knock-down.

High-complexity pools increase silencing robustness and efficiency

Phenotypic variation between siRNAs targeting the same gene may not only be attributed to off-target effects but also to variable target gene silencing efficiency. In spite of well established design rules, silencing efficiency of individual siRNAs is not perfectly predictable and can only be accurately assessed by experimental validation.

Pooling siRNAs increases the chance of including highly efficient siRNAs, which were shown to define the silencing efficiency of the pool. Most genes express a number of transcripts (isoforms) with different exon combinations. Whilst it is normally possible to find one siRNA targeting all transcripts, the selection may be limited by the available common targeting sequence. Complex siRNA pools can target every transcript with multiple siRNAs, even in cases where isoforms share no common sequence.

As such, complex siRNA pooling enhances silencing robustness, providing maximum silencing without the need for extensive experimental validation.

Best RNAi knockdown with minimal effort. Two siPOOLs against the same gene gave similar knockdown efficiencies.

Unreliable silencing. Knockdown with single siRNAs against the same gene was far more variable (poor correlation, R=0.4).

An obvious limitation of siRNAs as exogenous reagent is the requirement for transfection and the limited duration of the effect. Most immortalized cell lines with epithelial, adherent cell morphology can be readily transfected with siRNAs and show efficient silencing at low nM siRNA concentrations. Suspension cell lines and many primary cell lines, however, require far higher siRNA concentrations or cannot be transfected at all. Also, siRNA effects will typically only last for several days in proliferating cell culture, as siRNA will be diluted and degraded with time. For assays requiring the observation of cells over longer periods of time this may be insufficient.

Here, small hairpin RNAs (shRNAs) can be a useful alternative. shRNAs are normally transcribed from expression constructs which may be stably integrated in the genome of the target cells. shRNA transcripts contain sense- and antisense strands in one RNA molecule, allowing self hybirdization to a hairpin structure. Resembling miRNA precursors, shRNAs are processed by the RISC nuclease Dicer, cleaving off the terminal loop and releasing a functional  siRNA molecule. Of note, Dicer cleavage has an inaccuracy of one base, giving rise to siRNA molecules of 21 and 22 bases length. Dicer-processed shRNAs show the same properties as chemically synthesized siRNAs.  As such, they are as prone to off-target as single siRNAs, in particular, if they are transcribed from strong promotors that lead to high accumulations in target cells.

Seed based off-target effects are largely considered a limitation of RNAi. In fact, many large-scale RNAi screens suffered from dominant off-target effects, limiting the output to a small number of critically involved gene factors that were frequently known from the beginning.

Alternatively, seed based off-target effects can be seen as rich, additional source of functional information. Of note, as seed based off-target effects may hit any transcribed gene with a suitable 3’ UTR, seed-information is by definition genome-wide.

Due to the short length of the seed sequence and its degenerate target recognition, reliable seed analysis normally requires at least medium-scale RNAi screening data sets of hundreds of siRNAs or shRNAs. Importantly, data sets should originate from single siRNA or shRNA reagents, where phenotypic effects can be associated with one single seed sequence. In low complexity pools of 3 or 4 siRNAs, this association is diluted, further complicating the analysis. As such, low complexity pools of 3 or 4 siRNAs combine the disadvantage of insufficient off-target dilution with a lack of suitability for seed-based analysis.

A number of potent seed-based, statistical hit detection algorithms have been described which, depending on the available dataset have different strength and weaknesses:

  1. Genome-wide Enrichment of Seed Sequence matches (GESS) was the firstly widely used method for detecting seed-based hits.  It uses a graphical method to detect strong off-target genes by identifying outliers on a plot.  It requires choosing a cutoff for what is considered a hit siRNA.
  2. Haystack is another early seed-based hit detection algorithm.  It uses linear regression to identify transcripts whose predicted off-target silencing correlates with observed screening results.  Hit transcripts are identified using p-value cutoffs
  3. SENSORS uses a non-parametric rank test to determine compare the phenotypic score for siRNAs with seed matches to siRNAs without seed matches.  It does not require setting any cutoffs prior to performing the analysis.

References

  1. Potent and specific genetic interference by double-stranded RNA in Caenorhabditis elegans. Fire et al. (1998) nature
  2. The genome sequence of Drosophila melanogaster. Adams et al. (2000)  Science
  3. Initial sequencing and analysis of the human genome. Lander et al. (2001)  nature
  4. RNA interference is mediated by 21- and 22-nucleotide RNAs.  Elbashir et al. (2001)  Genes & Development
  5. Duplexes of 21-nucleotide RNAs mediate RNA interference in cultured mammalian cells. Elbashir et al. (2001) nature
  6. Role for a bidentate ribonuclease in the initiation step of RNA interference. Bernstein et al. (2001)  nature
  7. Argonaute2, a link between genetic and biochemical analyses of RNAi. Hammond et al. (2001)  Science
  8. Human Argonaute2 mediates RNA cleavage targeted by miRNAs and siRNAs. Meister et al. (2004)  Molecular Cell
  9. Functional anatomy of siRNAs for mediating efficient RNAi in Drosophila melanogaster embryo lysate. Elbashir et al. (2001) The EMBO Journal
  10. Asymmetry in the assembly of the RNAi enzyme complex. Schwarz et al. (2003)  Cell 
  11.  Rational siRNA design for RNA interference. Reynolds et al. (2005)  nature biotechnology
  12. Design of a genome-wide siRNA library using an artificial neural network. Huesken et al. (2005)  nature biotechnology
  13. Expression profiling reveals off-target gene regulation by RNAi. Jackson et al. (2003)  nature biotechnology
  14.  siRNA-mediated off-target gene silencing triggered by a 7 nt complementation. Lin et al. (2005)  Nucleic Acids Research
  15. 3' UTR seed matches, but not overall identity, are associated with RNAi off-targets. Birmingham et al. (2006)  nature methods
  16. Position-specific chemical modification of siRNAs reduces "off-target" transcript silencing. Jackson et al. (2006) RNA
  17. Oncology studies using siRNA libraries: the dawn of RNAi-based genomics. Sachse et al. (2004)  Oncogene
  18. Genome-wide analysis of human kinases in clathrin- and caveolae/raft-mediated endocytosis. Pelkmans et al. (2005)  nature 
  19. Systems survey of endocytosis by multiparametric image analysis. Collinet et al. (2010)  nature
  20. Common seed analysis to identify off-target effects in siRNA screens. Marine et al. (2012) SLAS Discovery
  21. Weak seed-pairing stability and high target-site abundance decrease the proficiency of lsy-6 and other microRNAs. Garcia et al.(2011) nature structural & molecular biology
  22. Predicting effective microRNA target sites in mammalian mRNAs. Agarwal et al. (2015)  eLife
  23. siPools: highly complex but accurately defined siRNA pools eliminate off-target effects. Hannus et al. (2014)  Nucleic Acids Research
  24. RNAi Screen for NRF2 Inducers Identifies Targets That Rescue Primary Lung Epithelial Cells from Cigarette Smoke Induced Radical Stress. Schumacher et al. (2016)  Plos One