smallRNA group: software

Here you will find some useful software for analysis and processing of NGS datasets. The software is provided under the terms of the Creative Commons Attribution Non-Commercial License V2.0. Detailed information about usage of the software is provided in a separate documentation. To download the software right-click on the logo and choose "save link/target as". Execution of Perl scripts requires installation of a Perl interpreter. Free Perl distributions can be found at http://strawberryperl.com/ or http://www.activestate.com/activeperl/downloads.

unitas is a convenient tool for efficient annotation of small non-coding RNA sequence datasets produced by Next Generation Sequencing. All you need is a computer and a connection to the internet. unitas uses latest reference sequences from publicly available online databases to annotate user input sequences. No installation, no further prerequisites; it runs out-of-the-box on any popular platform (Linux, MacOS, Windows) and can be started with one simple command from the command line (terminal). unitas accepts sequence files in FASTA or FASTQ format, or alternatively map files in SAM or ELAND3 format (standard output of sRNAmapper).

**Available downloads**
V 1.6.0	request	request	Standalone executable Windows (64 bit)	Perl script / source code (all platforms)
V 1.6.1	request	Standalone executable MacOS (64 bit)	request	Perl script / source code (all platforms)
V 1.6.2	request	request	request	Perl script / source code (all platforms)
V 1.7.0	request	request	request	Perl script / source code (all platforms)

* contact us for older versions of unitas.

We will constantly provide versions with updated internal URL link lists. Since unitas will always first try to use the URL link lists stored on our server (instead of its internal list), older versions of unitas will still work properly. New executables (incl. a updated documentation) will be provided in case of new functionality or bug fixes.

Information for Mac and Linux users:
Commonly, executable files downloaded from the web cannot be executed without changing file permissions. Therefore, try the following terminal command:

chmod 755 unitas
or
chmod a+rwx unitas

Further, unitas will download the SeqMap source code and compile it on your local machine with g++. If g++ is missing on your computer unitas will download a precompiled SeqMap executable. In this case, you will most likely have to change file permissions manually, to make this file executable.

proTRAC predicts and analyzes genomic piRNA clusters based on mapped piRNA sequence reads. proTRAC 2.0 and later versions apply a sliding window approach to detect loci that exhibit high sequence read coverage. Subsequently, sequences mapped to these loci are analyzed with respect to typical piRNA and piRNA cluster characteristics to ensure high specificity. proTRAC runs with basic core Perl which is commonly pre-installed on Unix and Mac computers. Windows users can install a free Perl distribution such as Strawberry Perl or ActivePerl. Alternatively we provide a precompiled proTRAC executable (v.2.4.2) that runs on 64 bit Windows computers.

PACKEIS is a software that allows us to assess whether or not a coding sequence represents an extreme solution in terms of backfolding, considering the alternative coding sequences that could have been realized by evolution in order to encode the given peptide sequence based on usage of synonymous codons.

NGS TOOLBOX is a collection of simple open source Perl scripts that perform basic analyses and processing steps using next generation sequencing (NGS) datasets. Each tool is designed to ensure convenient and intuitive usage. Installation and usage does not require any bioinformatics skills. All scripts work out-of-the-box. Advanced users may use the command line based Perl scripts to build their own automated sequence analyses/processing pipelines.

sRNAmapper is specifically designed to map small RNA sequences to genomes. To this end it uses a specialized mapping algorithm that requires a perfect 5' seed match (default: 18 nt) and optionally allows non-template 3' nucleotides as well as internal mismatches in the part of the sequence that follows the seed match. Allowing non-template 3' ends will ensure the mapping of 3' modified (adenylated/uridylated) small RNAs while allowing internal mismatches can enhance sensitivity considering degressive read quality towards 3' ends.

reallocate post-processes map files in order to reallocate read counts of multiple mapping sequences according to the transcription rate of genomic loci based on uniquely mapping reads. Map files must be in ELAND format and can be created using sRNAmapper which is provided along with the proTRAC software. reallocate will output a modified map file that contains two additional columns that refer to i) total number of genomic hits of a sequence and ii) read counts that are assigned to this locus. proTRAC 2.0.5 and later versions accept this format and utilize this information for cluster prediction. Generally, using reallocate will result in a higher amount of sequence reads that can be assigned to predicted piRNA clusters and may also alter the number of predicted piRNA clusters (more true-positives, less false-positives). Using reallocate is specifically recommended for datasets with large amounts (>= 50%) of transposon related small RNAs such as pre-pachytene mammalian piRNA transcriptomes or drosophila piRNA transcriptomes.

piFETCH is a tiny fetching tool to download data from piRNA cluster database without using the web interface. piFETCH allows to download complete proTRAC results for available NCBI SRA datasets or specified information (piRNA cluster sequence, reads mapped to a cluster, proTRAC image file) from selected piRNA clusters for a desired SRA dataset. You can also download clipped and filtered reads from any available SRA sequence set as well as sequence reads from the specified SRA dataset(s) that matched miRNA- or miRNA precursor sequences, respectively

PHASER is a tool to analyze the 3’ - 5’ distances of mapped sequence reads. It has been recently described that secondary piRNA biogenesis (piRNA ping-pong) can induce Zucchini-dependent primary processing of targeted transcripts resulting in the production of so-called phased piRNAs (Han et al. 2015, Mohn et al. 2015). In this process, the target molecule is sliced consecutively starting from a ping-pong target site, and each downstream cleavage position determines the 3’ and 5’ end of adjacent (trail-) piRNAs, respectively. The amount of phased piRNAs can be determined when analyzing 3’ - 5’ distances of mapped sequence reads where a distance of 1 indicates a pair of phased piRNAs.

PPmeter is is a tool to quantify and compare the amount of ongoing ping-pong amplification. Since the number of ping-pong pairs within a given datasets depends on dataset size and grows non-linearly, other methods must be applied when comparing the ping-pong footprint across different datasets. PPmeter generates pseudo-replicates by repeated bootstrapping (default=100) of a fixed number of sequence reads (default=1000000) from a set of original sRNA sequence datasets. PPmeter then calculates the ping-pong signature of each pseudo-replicate and counts the number of sequence reads that participate in the ping-pong amplification loop. The obtained parameter - ping-pong reads per million bootstrapped reads - is comparable across different datasets.