Here you will find some useful software for analysis and processing of NGS
datasets. The software is provided under the terms of the Creative Commons Attribution Non-Commercial License V2.0
Detailed information about usage of the software is provided in a
separate documentation. To download the software right-click on the
logo and choose "save link/target as". Execution of Perl scripts
requires installation of a Perl interpreter. Free Perl distributions can
be found at http://strawberryperl.com/
is a convenient tool for efficient annotation of small non-coding RNA sequence datasets produced by Next Generation Sequencing.
All you need is a computer and a connection to the internet. unitas
uses latest reference sequences
from publicly available online databases to annotate user input sequences. No installation, no
further prerequisites; it runs out-of-the-box on any popular platform (Linux, MacOS, Windows) and can be
started with one simple command from the command line (terminal). unitas
accepts sequence files
in FASTA or FASTQ format, or alternatively map files in SAM or ELAND3 format (standard output of sRNAmapper)
We will constantly provide versions with updated internal URL link lists. Since unitas
will always first try to use the URL link lists stored on our server
(instead of its internal list), older executables of unitas
will still work properly. New executables (incl. a updated documentation) will be provided in case
of new functionality or bug fixes.
Information for Mac and Linux users:
Commonly, executable files downloaded from the web cannot be executed without changing file permissions. Therefore, try the following terminal command:
chmod 755 unitas
chmod a+rwx unitas
will download the SeqMap source code and compile it on your local machine with g++. If g++ is missing on your computer unitas
download a precompiled SeqMap executable. In this case, you will most likely have to change file permissions manually, to make this file executable.
predicts and analyzes genomic piRNA clusters based on mapped piRNA sequence
reads. proTRAC 2.0 and later versions apply a sliding window approach
to detect loci that exhibit high sequence read coverage. Subsequently,
sequences mapped to these loci are analyzed with respect to typical
piRNA and piRNA cluster characteristics to ensure high specificity.
proTRAC runs with basic core Perl which is commonly pre-installed on
Unix and Mac computers. Windows users can install a free Perl distribution
such as Strawberry Perl
Alternatively we provide a
precompiled proTRAC executable
that runs on 64 bit Windows computers.
is a collection of simple open source Perl scripts that perform basic
analyses and processing steps using next generation sequencing (NGS)
datasets. Each tool is designed to ensure convenient and intuitive
usage. Installation and usage does not require any bioinformatics
skills. All scripts work out-of-the-box. Advanced users may use the
command line based Perl scripts to build their own automated sequence
specifically designed to map small RNA sequences to genomes. To this
end it uses a specialized mapping algorithm that requires a perfect 5'
seed match (default: 18 nt) and optionally allows non-template 3'
nucleotides as well as internal mismatches in the part of the sequence
that follows the seed match. Allowing non-template 3' ends will
ensure the mapping of 3' modified (adenylated/uridylated) small RNAs
while allowing internal mismatches can enhance sensitivity considering
degressive read quality towards 3' ends.
post-processes map files in order to reallocate read counts of multiple
mapping sequences according to the transcription rate of genomic loci
based on uniquely mapping reads. Map files must be in ELAND format and
can be created using sRNAmapper which is provided along with the
proTRAC software. reallocate will output a modified map file that
contains two additional columns that refer to i) total number of
genomic hits of a sequence and ii) read counts that are assigned to
this locus. proTRAC 2.0.5 and later versions accept this format and
utilize this information for cluster prediction. Generally, using
reallocate will result in a higher amount of sequence reads that can be
assigned to predicted piRNA clusters and may also alter the number of
predicted piRNA clusters (more true-positives, less false-positives).
Using reallocate is specifically recommended for datasets with large
amounts (>= 50%) of transposon related small RNAs such as
pre-pachytene mammalian piRNA transcriptomes or drosophila piRNA
a tiny fetching tool to download data from piRNA cluster database without using the web
interface. piFETCH allows to download complete proTRAC results for available NCBI SRA
datasets or specified information (piRNA cluster sequence, reads mapped to a cluster, proTRAC
image file) from selected piRNA clusters for a desired SRA dataset. You can also download
clipped and filtered reads from any available SRA sequence set as well as sequence reads from
the specified SRA dataset(s) that matched miRNA- or miRNA precursor sequences, respectively
a tool to analyze the 3’ - 5’ distances of mapped sequence reads. It has been recently described that
secondary piRNA biogenesis (piRNA ping-pong) can induce Zucchini-dependent primary processing
of targeted transcripts resulting in the production of so-called phased piRNAs (Han et al. 2015, Mohn et al. 2015).
In this process, the target molecule is sliced consecutively starting from a ping-pong target site, and
each downstream cleavage position determines the 3’ and 5’ end of adjacent (trail-) piRNAs, respectively.
The amount of phased piRNAs can be determined when analyzing 3’ - 5’ distances of mapped sequence
reads where a distance of 1 indicates a pair of phased piRNAs.
is an annotation tool designed for repeat annotation of whole genome
nnotation of r
epeats). It is similar to the RepeatMasker software developed
by A.F.A. Smit, R. Hubley & P. Green (unpublished) but is
optimized for subsequent repeat annotation of genomically mapped small
RNA sequences. Quasar annotation differs from RepeatMasker
annotation in that Quasar will annotate according to the highest
sequence similarity whereas RepeatMasker rather mirrors the biological
transposon insertion history. Therefore, using Quasar will give a
more accurate target prediction of e.g. piRNAs as compared to using