Glossary
- BAM
- Binary SAM format. BAM files are binary formatted, indexed and
allow random access.
- BCF
- Binary VCF
- bgzip
- Utility in the htslib package to block compress genomic data
files.
- cigar
- Stands for Compact Idiosyncratic Gapped Alignment Report and
represents a compressed (run-length encoded) pairwise alignment
format. It was first defined by the Exonerate Aligner, but was alter
adapted and adopted as part of the SAM standard and many other
aligners. In the Python API, the cigar alignment is presented as a
list of tuples
(operation,length)
. For example, the tuple [
(0,3), (1,5), (0,2) ]
refers to an alignment with 3 matches, 5
insertions and another 2 matches.
- column
- Reads that are aligned to a base in the reference sequence.
- contig
- The sequence that a tid refers to. For example
chr1
, contig123
.
- csamtools
- The samtools C-API.
- faidx
- Utility in the samtools package to index fasta formatted
files.
- fetching
- Retrieving all mapped reads mapped to a region.
- hard clipping
- hard clipped
- In hard clipped reads, part of the sequence has been removed
prior to alignment. That only a subsequence is aligend might be
recorded in the cigar alignment, but the removed
sequence will not be part of the alignment record, in contrast
to soft clipped reads.
- pileup
- Pileup
- Reference
- Synonym for contig
- region
- A genomic region, stated relative to a reference sequence. A
region consists of reference name (‘chr1’), start (10000), and
end (20000). Start and end can be omitted for regions spanning
a whole chromosome. If end is missing, the region will span from
start to the end of the chromosome. Within pysam, coordinates
are 0-based, half-open intervals, i.e., the position 10,000 is
part of the interval, but 20,000 is not. An exception are
samtools compatible region strings such as
‘chr1:10000:20000’, which are closed, i.e., both positions 10,000
and 20,000 are part of the interval.
- SAM
- A textual format for storing genomic alignment information.
- sam file
- A file containing aligned reads. The sam file can either
be a BAM file or a TAM file.
- samtools
- The samtools package.
- soft clipping
- soft clipped
- In alignments with soft clipping part of the query sequence
are not aligned. The unaligned query sequence is still part
of the alignment record. This is in difference to
hard clipped reads.
- tabix
- Utility in the htslib package to index bgzip compressed
files.
- tabix file
- A sorted, compressed and indexed tab-separated file created
by the command line tool
tabix
or the commands
tabix_compress()
and tabix_index()
. The file
is indexed by chromosomal coordinates.
- tabix row
- A row in a tabix file. Fields within a row are
tab-separated.
- TAM
- Text SAM file. TAM files are human readable files of
tab-separated fields. TAM files do not allow random access.
- target
- The sequence that a read has been aligned to. Target
sequences have bot a numerical identifier (tid)
and an alphanumeric name (Reference).
- tid
- The target id. The target id is 0 or a positive integer mapping to
entries within the sequence dictionary in the header section of
a TAM file or BAM file.
- VCF
- Variant call format