FASTA - Maple Help

All Products Maple MapleSim

Home : Support : Online Help : Programming : Input and Output : File Formats : FASTA

FASTA (.fasta) File Format

FASTA file format

Description

Details on the FASTA format

Notes

Examples

References

Description

•	FASTA is a plaintext format for storing protein or nucleic acid (DNA or RNA) data as character sequences. It is a popular interchange format for molecular biology software.

•	The commands Import and Export support this format.

Details on the FASTA format

•	The FASTA format employs the following standard IUB/IUPAC conventions for encoding protein or nucleic acid sequences as alphabetic characters.

•	In addition to codes specifying particular nucleic acids or amino acids, the convention supports codes for ambiguous sequences where a position may be occupied by more than one possible nucleic acid or amino acid. For example the code R matches either adenine (A) or guanine (G).

Table 1: Nucleic Acid Codes

Code	Meaning	Description	Code	Meaning	Description
A	A	Adenine	B	{C,G,T,U}	Not A
C	C	Cytosine	D	{A,G,T,U}	Not C
G	G	Guanine	H	{A,C,T,U}	Not G
T	T	Thymine	V	{A,C,G}	Not T or U
U	U	Uracil	N	{A,C,G,T,U}	Any Nucleic acid
R	{A,G}	Purine	Y	{C,T,U}	Pyramidine
K	{G,T,U}	Ketone	M	{A,C}	Amino
S	{C,G}	Strong interaction	W	{A,T,U}	Weak interaction

Table 2: Amino Acid Codes

Code	Description	Code	Description	Code	Description
A	Alanine	J	I or L	S	Serine
B	D or N	K	Lysine	T	Threonine
C	Cysteine	L	Leucine	U	Selenocysteine
D	Aspartic acid	M	Methionine	V	Valine
E	Glutamic acid	N	Asparagine	W	Tryptophan
F	Phenylalanine	O	Pyrrolysine
G	Glycine	P	Proline	Y	Tyrosine
H	Histidine	Q	Glutamine	Z	E or Q
I	Isoleucine	R	Arginine
X	any amino acid	*	translation stop	-	gap of indeterminate length

Notes

•	Content-Type: chemical/seq-aa-fasta, chemical/seq-na-fasta

Examples

Import a DNA sequence from a FASTA file.

>	$DNASequence ≔ Import (example/humanmtDNA.fasta, base = datadir) &colon;$

Read the descriptor for the first sequence in the file.

>	$DNASequence [1, 1]$

$Human mitochondrial genome,HVR2,CR,HVR1$

(1)

Examine positions 100 through 150 in this sequence.

>	$DNASequence [1, 2] [100 .. 150]$

$GGAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATC$

(2)

Count the frequency of each of the nucleotide base pairs within the sequence.

>	$frequencies ≔ StringTools :- CharacterFrequencies (DNASequence [1, 2], dna)$

$frequencies ≔ A = 5118, C = 5185, G = 2175, T = 4092$

(3)

>	$Statistics :- ColumnGraph ([frequencies])$

References

IUPAC code for incomplete nucleic acid specification, National Center for Biotechnology Information.

A One-Letter Notation for Amino Acid Sequences, International Union of Pure and Applied Chemistry.

Maple

Maple Add-Ons

Math Success Platform

Math success in the age of AI

Maple Flow

MapleSim

Consulting Services

Maple T.A. and Möbius

Education

Industries

Automotive and Aerospace

Robotics

Machine Design & Industrial Automation

Other

Application Areas

Product Pricing

Purchasing

Institutional Student Licensing

Maplesoft Elite Maintenance (EMP)

Support

Product Training

Online Product Help

Webinars & Events

Publications

Content Hubs

Examples & Applications

Community

About Maplesoft

Media Center

User Community

Contact

Online Help

All Products Maple MapleSim

Maple

Powerful math software that is easy to use

Maple Add-Ons

Math Success Platform

Math success in the age of AI

Maple Flow

Engineering calculations & documentation

MapleSim

Advanced System Level Modeling

Consulting Services

Maple T.A. and Möbius

Education

Industries

Automotive and Aerospace

Robotics

Machine Design & Industrial Automation

Other

Application Areas

Product Pricing

Purchasing

Institutional Student Licensing

Maplesoft Elite Maintenance (EMP)

Support

Product Training

Online Product Help

Webinars & Events

Publications

Content Hubs

Examples & Applications

Community

About Maplesoft

Media Center

User Community

Contact

Online Help

All Products Maple MapleSim