For seamless reading of different file formats, gggenomes uses a mapping of
known formats to associated file extensions and contexts in which the
different formats can be read. The notion of context allows one to read
different information from the same format/extension. For example, a gbk file
holds both feature and sequence information. If read in "feats" context
read_feats("*.gbk") it will return a feature table, if read in "seqs"
read_seqs("*.gbk"), a sequence index.
def_formats( file = NULL, ext = NULL, context = NULL, parser = NULL, allow_na = FALSE )
a vector of file names
a vector of file extensions
a vector of file contexts defined in
a vector of file parsers defined in
dictionarish vector of file formats with recognized extensions as names
format ext context parser 1 ambigious txt, tsv, csv NA read_ambigious 2 fasta fa, fas, fasta, ffn, fna, faa seqs read_seq_len 3 fai fai seqs read_fai 4 gff3 gff, gff3, gff2, gtf feats, seqs read_gff3, read_seq_len 5 gbk gbk, gb, gbff, gpff feats, seqs read_gbk, read_seq_len 6 bed bed feats read_bed 7 blast m8, o6, o7 feats, links read_blast, read_blast 8 paf paf feats, links read_paf, read_paf 9 alitv json feats, seqs, links read_alitv_genes, read_alitv_seqs, read_alitv_links
# vector of defined zip formats and recognized extensions as names # format of file def_formats("foo.fa") #> fa #> "fasta" # formats associated with each extension def_formats(ext=qc(fa, gff)) #> fa gff #> "fasta" "gff3" # all formats/extensions that can be read in seqs context; includes formats # that are defined for context=NA, i.e. that can be read in any context. def_formats(context="seqs") #> json txt tsv csv fai fa #> "alitv" "ambigious" "ambigious" "ambigious" "fai" "fasta" #> fas fasta ffn fna faa gbk #> "fasta" "fasta" "fasta" "fasta" "fasta" "gbk" #> gb gbff gpff gff gff3 gff2 #> "gbk" "gbk" "gbk" "gff3" "gff3" "gff3" #> gtf #> "gff3"