Contents




S1: Genomes

back to top

StrainMAP TypeStatusGenomeexternal linkNCBI annotationdownload date
Mycobacterium avium subsp. paratuberculosis K-10 MAP-C, Type II chromosome download download link download 29.10.2013
Mycobacterium avium subsp. paratuberculosis K-10' MAP-C, Type II chromosome download download link download 29.10.2013
Mycobacterium avium subsp. paratuberculosis MAP4 MAP-C, Type II chromosome download download link download 29.10.2013
Mycobacterium avium subsp. paratuberculosis JII-1961 MAP-C, Type II chromosome download not published not available --
Mycobacterium avium subsp. paratuberculosis JIII-386 MAP-S, Type III scaffolds or contigs (6) download not published not available --
Mycobacterium avium subsp. paratuberculosis S397 MAP-S, Type III scaffolds or contigs (176) download download link download 29.10.2013
Mycobacterium avium subsp. paratuberculosis CLIJ361 MAP-S, Type I scaffolds or contigs (1147) download download link not available 29.10.2013
Mycobacterium avium subsp. hominissuis 104 -- chromosome download download link download 29.10.2013


S2: Genotypes of MAP strains used in this study

back to top

Download: ODS

IS900-RFLPa SSR at locic
Strain Origin Host MAP-Type BstEII PstI MIRU-VNTR-profileb 1 2 8 9
JIII-386 Germany ovine MAP-S, Type III I6 P13 421311*18h 7 ≥11 3 4
JII-1961 Germany bovine MAP-C, Type II C17 P9 42332128 7 9 4 4
K-10d U.S. bovine MAP-C, Type II C1 - 32332228 >11 10 5 5
MAP4e U.S. human MAP-C, Type II - - - - - - -
CLIJ361f Australia ovine MAP-S, Type I - - - - - - -
S397g U.S. ovine MAP-S, Type III - - - - - - -

a after digestion with BstEII, IS900-RFLP types C1 and C17 were designated according to the nomenclature of Pavlik et al. (1999), I6 was designated according to Möbius et al. (2009). After digestion with PstI, band pattern were designated according to Möbius et al. (2008, 2009)
b number of tandem repeats (TR) at MIRU-VNTR loci 292, X3, 25, 3, 47, 7, 10, 32 according to Thibault et al. (2007)
c number of the short sequence repeats G1 and G2 (locus 1 and 2), GGT (locus 8), and TGC (locus 9) according to Amonsin (2004)
d results published by Thibault et al. (2008)
e according to Bannantine et al. (2014)
f according to Wynne et al. (2011)
g according to Bannantine et al. (2012)
h at MIRU 1 (Bull, 2003) JIII-386 show only 1 repeat, common are 3 MIRU-I repeat in MAP genomes



S3: Raw reads

back to top

Whole-genome shotgun sequencing Illumina paired-end (fragment size ∼300 bp) and mate-pair (fragment size ∼2.2 kb) libraries were generated from fragmented genomic DNA of Mycobacterium avium subsp. paratuberculosis strain JIII-386. Libraries were sequenced using Illumina GAIIx (paired-end library) and HiSeq2000 (mate-pair library) to obtain 28.6 million 101-bp paired ends (∼1,100-fold genome coverage) and 10.9 million 100-bp mate-pairs (∼440-fold genome coverage).

Instrument Protocol # read pairs Read length Estimated genome coverage Fragment size
Illumina GAIIx paired-end 2x 28.6 mil 101 bp ∼ 1.100-fold 300 bp
Illumina HiSeq2000 mate-pair 2x 10.9 mil 100 bp ∼ 440-fold 2,200 bp


S4: Assembly statistics

back to top

Download: ODS

Assembler # contigs # contigs > 1000 bp # bp max contig N50 download
CLC Genomics Workbench 130 106 4792650 265629 80173 fasta
SSPACE Scaffolding 54 33 4825138 833887 301438 fasta
Gap closing (Sanger Sequencing) 14 6 4846897 1505746 1245182 fasta
Final Assembly 6 6 4850274 1505968 1245802 fasta
JR-Assembler 622 514 4752547 66282 14005 fasta
Velvet 1193 925 4645764 38963 6400 fasta
ABySS 4824 489 5048635 64951 14188 fasta
SPAdes 632 327 4847333 89924 23648 fasta
Cluster-Assembly (cd-hit-est) 532 531 7343160 89924 23046 fasta


S5: Final scaffolds of JIII-386 assembly

back to top

ScaffoldLength (bp)Download
S011,505,968fasta
S02869,609fasta
S031,245,802fasta
S04591,024fasta
S05545,230fasta
S0692,641fasta


S6: Replacements of low complexity regions (poly-N|A|T)

Download: ODS

back to top

target replacement
scaffold start pos end pos # nt contig assembly start pos end pos # nt orientation download
poly-N
S02 190565 191249 685 NODE_231_length_1452_cov_1509.103271 Cluster 59 1445 1387 + calc
S02 639470 640085 616 AFIF01000036.1 MAP S397 73 1269 1197 + calc
S02 771177 771965 788 NODE_93_length_1728_cov_3883.945068 Cluster 87 1710 1623 - calc
S03 833677 834228 552 NODE_66_length_1354_cov_6844.119629 Cluster 215 1322 1108 + calc
S04 261 419 159 NODE_160_length_2743_cov_329.516602 Cluster 116 199 84 + calc
S04 201770 202383 614 SPADES_N296 Cluster 881 2111 1231 - calc
S05 1432 2042 611 NODE_66_length_1354_cov_6844.119629 Cluster 100 1324 1225 + calc
S05 337153 337434 282 SPADES_N298 Cluster 143 603 461 + calc
S06 69552 70044 493 NODE_66_length_1354_cov_6844.119629 Cluster 67 1322 1256 + calc
poly-A
S02 32770 32803 34 SPADES_N49 Cluster 14 46 33 - calc
S04 241571 241803 233 contig462_start645_12448 Cluster 815 1193 379 + calc
S05 410104 410192 89 SPADES_N224 Cluster 53 148 99 - calc
poly-T
S03 657890 657996 107 SPADES_N320 Cluster 255 425 171 + calc


S7: Extensions

back to top

Download: ODS

Target Replacement
scaffold ending contig id start pos end pos # nt orientation download
S01 3' SPADES_N245 4179 4280 102 + blast
S01 5' contig170_start618_1178 1 120 120 + blast
S02 3' SPADES_N14* 49302 49345 44 + blast
S02 5' contig347_start806_5254** 1 96 96 + blast
S06 5' SPADES_N163 10192 10333 142 - blast

*found also a possible extension of contig (AFNS01000496.1) in MAP CLIJ361 assembly (see blast result)

**found also a possible extension of contig (AFIF01000144.1) in MAP S397 assembly (see blast result)



S8: Different queries hit on same scaffold in JIII-386 assembly

back to top

Based on our scaffolds we could show possible connections between 14 contigs of MAP S397 assembly and 2 contigs of MAP CLIJ361 assembly and therefore could improve these assemblies.

Download: ODS

assemblyqseqidsseqid pident length mismatch gapopen qstart qend qlen qori sstart send slen sori evalue bitscore
S397 gi|336459779|gb|AFIF01000073.1| S01 99.96 55645 19 1 1 55645 55645 + 31 55674 1505746 - 0.0 1.026e+05
S397 gi|336459823|gb|AFIF01000072.1| S01 99.93 36137 24 1 1 36137 36137 + 55675 91810 1505746 - 0.0 6.659e+04
S397 gi|336460103|gb|AFIF01000066.1| S01 99.98 5656 1 0 1 5656 5656 + 315471 321126 1505746 - 0.0 1.044e+04
S397 gi|336460108|gb|AFIF01000065.1| S01 99.94 27340 14 2 1 27339 27339 + 321127 348465 1505746 - 0.0 5.040e+04
S397 gi|336457707|gb|AFIF01000158.1| S02 99.94 5147 3 0 1 5147 5147 + 398415 403561 867352 - 0.0 9489
S397 gi|336457714|gb|AFIF01000157.1| S02 100.00 2408 0 0 1 2408 2408 + 403562 405969 867352 - 0.0 4447
S397 gi|336458511|gb|AFIF01000130.1| S03 100.00 1703 0 0 1 1703 15269 + 1138164 1139866 1245182 - 0.0 3145
S397 gi|336458530|gb|AFIF01000129.1| S03 99.99 7435 1 0 1 7435 7435 + 113986711473011245182-0.01.373e+04
S397 gi|336458530|gb|AFIF01000129.1| S03 99.99 7435 1017435 7435 +113986711473011245182-0.01.373e+04
S397 gi|336458539|gb|AFIF01000128.1| S03 99.996815 1016815 6815 +114730211541161245182-0.01.258e+04
S397gi|336458781|gb|AFIF01000118.1| S03 100.00 3662 0013662 3662 +147658 151319 1245182-0.06763
S397gi|336458787|gb|AFIF01000117.1| S03 99.943315 2013315 3315 +151320 154634 1245182-0.06111
S397gi|336459460|gb|AFIF01000099.1| S05 99.98168114011681116811+300817 317627 544426 -0.03.102e+04
S397gi|336459479|gb|AFIF01000098.1| S05 99.95165198011651916519+317628 334146 544426 -0.03.046e+04
CLIJ361gi|334868172|gb|AFNS01000472.1| S05 99.922599 2012599 2599 +268367 270965 544426 +0.04789
CLIJ361gi|334868173|gb|AFNS01000471.1| S05 100.00 861 001861 861 +270988 271848 544426 -0.01591


S9: Results for short scaffolds in prior assembly

back to top

The prior assembly consists of 14 scaffolds including 8 very short scaffolds (< 500 bp) which could all be included in or removed from the final assembly (based on read coverage and blast results).

scaffold raw reads mapped blast result
scaffold40.1_size358 7074 calc1 calc2 could be used to replace a poly-T and a poly-A (revcomp) region in S04, removed
scaffold41.1_size340 0 no significant blast result removed
scaffold42.1_size326 0 no significant blast result removed
scaffold43.1_size311 0 no significant blast result removed
scaffold44.1_size302 0 no significant blast result removed
scaffold49.1_size256 1865 calc mapping reverse complementary on S04 start, removed
scaffold51.1_size240 0 no significant blast result removed
scaffold52.1_size229 0 no significant blast result removed


S10: Genome comparison based on progressive Mauve alignments

back to top

Colored blocks connected by lines indicate homologous regions which are internally free from genomic rearrangements. Blocks below the center line are aligned reverse complementary. White gaps between colored blocks indicate non-homologous regions.

Use the iframe navigation bar to zoom in/out the alignment.

S10a: K-10' vs. JIII-386 vs. S397

Download: eps

S10b: JIII-386 vs. K-10'

Download: eps



S11: BacProt annotated protein-coding genes and NCBI merged annotations

back to top

BacProt annotations are provided including ORFs with an assigned function (CDS) based on homologous protein-coding genes as well as predicted ORFs without an assigned function (hypothetical protein coding genes). For the comparison against NCBI listed here, hypothetical protein-coding genes were removed from both annotations.

Merged annotation files (gff) are provided if both annotations (NCBI and BacProt) were available. For JIII-386 we used the lift-over annotation (S12) based on the annotation provided by NCBI for S397 for comparison and merging with the BacProt annotation. The merged annotations presented include ORFs with an assigned function (only CDS, no hypothetical) or all predicted ORFs (with hypothetical).

Species    BacProt GFF        Merged GFF (NCBI+BacProt)     BacProt in comparison to NCBI Shine-Dalgarno Motif
MAP K-10    GFF       GFF (no hypothetical)   
   GFF (with hypothetical)   

Equal: 998
Start shifted: 60
End shifted: 77
NCBI only: 11
BacProt only: 1961
baae Shine-Dalgarno
MAP K-10'    GFF       GFF (no hypothetical)   
   GFF (with hypothetical)   

Equal: 2046
Start shifted: 297
End shifted: 319
NCBI only: 385
BacProt only: 480
baae Shine-Dalgarno
MAP MAP4    GFF       GFF (no hypothetical)   
   GFF (with hypothetical)   

Equal: 2179
Start shifted: 248
End shifted: 293
NCBI only: 309
BacProt only: 362
baae Shine-Dalgarno
MAP JII-1961    GFF       N.A.   
No NCBI annotation
available

baae Shine-Dalgarno
MAP JIII-386    GFF       GFF (no hypothetical)   
   GFF (with hypothetical)   

Equal: 1953
Start shifted: 363
End shifted: 349
NCBI only: 501
BacProt only: 428
baae Shine-Dalgarno
MAP S397    GFF       GFF (no hypothetical)   
   GFF (with hypothetical)   

Equal: 2033
Start shifted: 262
End shifted: 326
NCBI only: 558
BacProt only: 387
baae Shine-Dalgarno
MAP CLIJ361    GFF       N.A.   
No NCBI annotation
available

baae Shine-Dalgarno
MAH 104    GFF       GFF (no hypothetical)   
   GFF (with hypothetical)   

Equal: 2661
Start shifted: 248
End shifted: 373
NCBI only: 190
BacProt only: 271
baae Shine-Dalgarno
MTBC H37Rv    GFF       N.A.    N.A. baae Shine-Dalgarno


S12: NCBI lift-over for JIII-386 based on sheep strain S397

back to top

A lift-over from the NCBI annotation of MAP S397 to MAP JIII-386 was performed using Blastn+. Therefore, genes already annotated for strain S397 were searched in the genome assembly of MAP JIII-386.

MAP JIII-386: lift-over annotation



S13: Codon usage

back to top

Codon usage of protein-coding genes is based on the BacProt annotation for each strain.

OrganismCodon usage
MAP K-10 pdf
MAP K-10' pdf
MAP MAP4 pdf
MAP JII-1961 pdf
MAP JIII-386 pdf
MAP S397 pdf
MAP CLIJ361 pdf
MAH 104 pdf


S14: Large sequence polymorphisms (LSPs)

back to top

S14a: LSPs present in S397 and JIII-386 but partially absent in MAP-C strains (K-10, Map4, JII-1961) and MAH 104

Shown are the BLAST results for the ten large sequence polymorphisms (reported by Bannantine et al. 2012) present in JIII-386.

Download: ODS

NameScaffoldStartStopOrientationSize (bp)MismatchesBitscoreDownload
LSPS1 S02 473973 464963- 9011 3 1.662e+04 fasta
LSPS2 S01 107781 101135- 6647 2 1.226e+04 fasta
LSPS3 S02 624583 620808- 3776 0 6974 fasta
LSPS4 S01 99971 96342- 3630 0 6704 fasta
LSPS5 S02 93165 96632+ 3468 1 6399 fasta
LSPS6 S01 682567 679572- 2996 3 5517 fasta
LSPS7 S02 99509 102396+ 2888 1 5328 fasta
LSPS8 S05 342150 339761- 2390 0 414 fasta
LSPS9 S04 337496 335655- 1842 0 3402 fasta
LSPS10 S01 489722 488144- 1579 0 2916 fasta


S14b: Distribution of 35 LSPs

Download: ODS

Shown are 35 LSPs. Labels and locations according to Bannantine et al. 2012 (LSPS1-10) and Alexander et al. 2009 (25 other LSPs). Details and genomic positions for LSPP1-17 were obtained from Semret et al. 2005.

ORF - open reading frame; MAPs_X - ORF nomenclature of MAP S397 strain; MAPX - ORF nomenclature of MAP K-10 strain; MAV_X - ORF nomenclature of MAH 104 strain; full - full-length hit; part - partial hit; * - all ORFs comprised by the LSP are present but split on different contigs or genomic locations.

NameSize (kb)ORFsK-10K-10'MAP4JII-1961JIII-386S397CLIJ361MAH 104
LSPS1 9.01 MAPs_15940-16060 - - - - full full full* part
LSPS2 6.65 MAPs_46190-46270 - - - - full full full* full*
LSPS3 3.78 MAPs_14620-14660 full full full full full full full* full
LSPS4 3.63 MAPs_46290-46320 - - - - full full full* full
LSPS5 3.47 MAPs_17580-17610 - - - - full full full part
LSPS6 3.0 MAPs_40470-40500 full full full full full full full -
LSPS7 2.89 MAPs_17640-17670 - - - - full full full full
LSPS8 2.39 MAPs_02730-02760 part part part part full full full* full
LSPS9 1.84 MAPs_23120-23150 full full full full full full full full
LSPS10 1.58 MAPs_42460-42490 full full full full full full full -
LSPP1 16.04 MAP0094-0107 full full full full full* full* full* part
LSPP2 4.57 MAP0282-0284 full full full full full full full -
LSPP3 3.04 MAP0387-0389 full full full full full full full -
LSPP4 15.38 MAP0851-0866 full full full full full full* full -
LSPP5 13.53 MAP0957-0967 full full full full full* full* full* -
LSPP6 7.71 MAP1231-1237 full full full full full full full -
LSPP7 7.46 MAP1344-1349 full full full full full full full* -
LSPP8 7.51 MAP1631-1637 full full full full full full full -
LSPP10 2.94 MAP2027-2029 full full full full full full full -
LSPP11 13.03 MAP2148-2158 full full full full full* full* full* part
LSPP12 19.44 MAP2179-2196 full full full full full* full* full* -
LSPP13 17.66 MAP2751-2768 full full full full full full full* part
LSPP14 65.11 MAP3725-3764 full full* full* full full full* full* -
LSPP15 5.43 MAP3771-3776 full full full full full full full -
LSPP16 6.67 MAP3814-3818 full full full full full full full -
LSPP17 3.79 MAP4266-4270 full full full full full* full* full* -
LSPA4-II 28.93 MAV_1973-2008 part part part part full* full* full* full
LSPA8 10.0 MAV_4974-4983 - - - - - - - full
LSPA9-I (GPL cluster) 14.93 MAV_3258-3272 - - - - full* full* full* full
LSPA11 7.79 MAV_4544-4549 full full full full - full full* full
LSPA18 16.42 MAV_5225-5243 - - - - full full full* full
LSPA20 8.05 MAV_2945-2952 full full full full - - - full
MAV-14 20.02 MAV_2972-2999 part part part part full* full* full* full
VA15 8.6 MAP1432-1438 full full full full full* part full* full*
Deletion 2 (LSPP9) 19.93 MAP1728c-1745 full full full full part part part full*



S15: Gain and loss of genes and gene clusters

back to top

S15a: Differences between examined strains

Here, all comparisons are based on the merged annotations for each strain obtained from NCBI (if available) and BacProt. Hypothetical protein coding genes (without an assigned function) were excluded from this analysis.

Shown are the numbers of protein coding genes with an assigned function, present in one strain (row) but absent in another (column). By choosing a number, the whole table including all absent/present genes between all strains will be displayed, initially sorted by the two strains the chosen number belongs to.

Default gene table (initially unsorted): click

Download: ODS

K‑10 K‑10' MAP4 JII‑1961 JIII‑386 S397 CLIJ361 MAH‑104
K‑10 - 0 2 2 50 33 208 212
K‑10' 0 - 2 2 50 33 208 212
MAP4 0 0 - 2 50 33 208 211
JII‑1961 0 0 2 - 50 33 207 212
JIII‑386 80 80 82 82 - 8 175 215
S397 80 80 82 82 25 - 178 215
CLIJ361 80 80 82 81 21 7 - 215
MAH‑104 488 488 489 490 467 448 619 -


S15b: Four of 70 genes, previously described to be specific for MAP-S (Type I/III) isolates (Bannantine et al., 2012), were also found in MAP-C (Type II) strains K-10, JII-1961 and Map4

Gene ID, positions and contig ID referring to MAP S397.

All genes were described as hypothetical protein-coding genes.

Download: ODS

NumberIDStartEndStrandContigE-valueBlast result vs. K-10K-10JII-1961?Map4?MAH 104?
1 MAPs_02640 37558 37728 + AFIF01000096 1e-85 result yes yes yes yes
2 MAPs_13370 41021 41209 - AFIF01000133 7e-59 result yes yes yes yes
3 MAPs_24380 <1 99 - AFIF01000003 4e-39 result yes yes yes yes
4 MAPs_27770 5458 5613 - AFIF01000020 1e-75 result yes yes yes no


S16: K-10 genes absent in S397 but partial present in JIII-386

back to top

Table based on Bannantine et al. 2012, only representing a subset of absent/present genes between the strains regarding to Supp. Tab S15a. All the 32 genes (comprising in 3 gene clusters) present in MAP-C strains are absent from sheep strain S397 regarding to Bannantine et al. (2012). Genes number 1 - 7 were found in sheep strain JIII-386 and CLIJ361.

Download: ODS

NumberIDStartEndSizeRv homologGeneDescriptionClassTypeMetabolismFunctionPresent in JIII-386?Present in MAH 104?
1 MAP1432 1564255 1565745 1490 Rv1128c ----- REP-family protein IV.B.2 Other REP13E12 familyblast hit blast hit
2 MAP1433c 1565742 1567487 1745 Rv3537 ----- 3-oxosteroid 1-dehydrogenase I.B.7 Small-molecule metabolism Energy Metabolism Miscellaneous oxidoreductases and oxygenasesblast hit blast hit
3 MAP1434 1567583 1568701 1118 Rv3526 ----- putative phthalate oxygenase I.B.7 Small-molecule metabolism Energy Metabolism Miscellaneous oxidoreductases and oxygenasesblast hit blast hit
4 MAP1435 1569096 1569809 713 ----- Short chain dehydrogenase V Conserved hypotheticals blast hit blast hit
5 MAP1436c 1569916 1570698 782 SCD66.05, putative oxidoreductase I.B.7 Small-molecule metabolism Energy Metabolism Miscellaneous oxidoreductases and oxygenasesblast hitblast hit
6 MAP1437c 1570719 1571705 986 hypothetical protein VI Unknowns blast hit blast hit
7 MAP1438c 1571838 1572821 983 Rv1399c lipH probable lipase II.B.5 Macromolecule metabolism Degradation of macromolecules Esterases and lipases blast hitblast hit
8 MAP1484c 1624930 1626315 1385 Rv3161c ----- putative dioxygenasesdiooxygenases V Conserved hypotheticalsno blast hit
9 MAP1485c 1626606 1627856 1250 Rv0214 acyl-CoA synthase I.A.3 Small-molecule metabolism Degradation Fatty acidsnoblast hit
10 MAP1486c 1627866 1628738 872 Rv0456c enoyl-CoA hydratase/isomerase superfamily I.A.3 Small-molecule metabolism Degradation Fatty acidsnoblast hit
11 MAP1487c 1629019 1630029 1010 Rv2496c pyruvate dehydrogenase E1 component [beta] subunit I.B.2 Small-molecule metabolism Energy Metabolism Pyruvate dehydrogenaseno blast hit
12 MAP1488c 1630041 1631033 992 Rv2497c pyruvate dehydrogenase E1 component [alpha] subunit I.B.2 Small-molecule metabolism Energy Metabolism Pyruvate dehydrogenaseno blast hit
13 MAP1489c 1631038 1631910 872 Rv2750 ----- putative dehydrogenase I.B.7 Small-molecule metabolism Energy Metabolism Miscellaneous oxidoreductases and oxygenasesno blast hit
14 MAP1490 1632173 1633054 881 ----- Alpha-methylacyl-coAracemase V Conserved hypotheticals no blast hit
15 MAP1491 1633129 1633404 275 ----- Alpha-methylacyl-coAracemase V Conserved hypotheticals no blast hit
16 MAP1728c 1889230 1889952 722 yfnB 2-haloalkanoic acid dehalogenase (EC 3.8.1.2) I.A.1 Small-molecule metabolism Degradation Carbon compoundsnoblast hit
17 MAP1729c 1891100 1891927 827 Rv2605c thioesterase II I.H.1 Small-molecule metabolism Lipid Biosynthesis Synthesis of fatty and mycolic acidsnoblast hit
18 MAP1730c 1891954 1892976 1022 SC9B10.02, ----- putative ATP/GTP-binding protein V Conserved hypotheticals no no
19 MAP1731c 1893076 1893345 269 hypothetical protein VI Unknowns no no
20 MAP1732c 1893345 1893992 647 Rv0302 ----- transcriptional regulator (TetR/AcrR family) I.J.1 Small-molecule metabolism Broad regulatory functions Repressors/activators nono
21 MAP1733 1894235 1894861 626 Proline rich protein precursor VI Unknowns no blast hit
22 MAP1734 1895243 1896526 1283 Rv2123 ----- PPE-family protein IV.C.2 Other PE and PPE families PPE familynoblast hit
23 MAP1735 1896774 1897700 926 Rv0217c lipW_1 probable esterase II.B.5 Macromolecule metabolism Degradation of macromolecules Esterases and lipasesno blast hit
24 MAP1736 1897902 1898510 608 SC8D11.10c, putative tetR-family transcriptional regulator I.J.1 Small-molecule metabolism Broad regulatory functions Repressors/activatorsnoblast hit
25 MAP1737 1898826 1899242 416 Rv0677c mmpS5 conserved small membrane protein II.C.4 Macromolecule metabolism Cell envelope Conserved membrane proteinsno blast hit
26 MAP1738 1899239 1902139 2900 Rv0676c mmpL5 conserved large membrane protein II.C.4 Macromolecule metabolism Cell envelope Conserved membrane proteinsnoblast hit
27 MAP1739c 1902213 1902995 782 Rv2002 fabG3_1 3-oxoacyl-[ACP] reductase I.H.1 Small-molecule metabolism Lipid Biosynthesis Synthesis of fatty and mycolic acidsnoblast hit
28 MAP1740c 1903102 1904640 1538 Rv3132c ----- sensor histidine kinase I.J.2 Small-molecule metabolism Broad regulatory functions Two component systems noblast hit
29 MAP1741c 1904670 1905554 884 Rv2005c ----- conserved hypothetical protein V Conserved hypotheticals no blast hit
30 MAP1742c 1905715 1906596 881 Rv2026c ----- conserved hypothetical protein V Conserved hypotheticals no blast hit
31 MAP1743c 1906663 1907664 1001 Rv2032 ----- conserved hypothetical protein V Conserved hypotheticals no blast hit
32 MAP1744 1907910 1908374 464 hypothetical protein VI Unknownsno blast hit


S17:ORFs present in JIII-386 and other MAP-S strains but absent from MAP-C strains

back to top

Here, only ORFs with previously or newly assigned function are presented (based on S15a).

Download: ODS

*partial hit within investigated MAP Type II genomes.

**BacProt predicted ORF including two MAPs ORFs partial annotated as hypothetical protein-coding genes in NCBI

***ORF with assigned function by homology (previously without assigned function in S397)

Number Scaffold Start End Strand Source ORF ID Description MAH 104? LSP (this study) LSP (Bannantine 2012) LSP (Alexander 2009) Conformity with Bannantine et al. 2012
1 S01 93562 94032 + RefSeq MAPs_46350 Polyketide cyclase / dehydrase and lipid transport;Polyketide cyclase / dehydrase and lipid transport yes LSPSII LSPA18 yes
2 S01 94033 94593 - RefSeq MAPs_46340 pyridoxamine 5'-phosphate oxidase yes LSPSII LSPA18 yes
3 S01 94692 96287 + RefSeq MAPs_46330 Flavin-binding monooxygenase-like yes LSPSII LSPA18 no
4 S01 96342 97139 - RefSeq MAPs_46320 short chain dehydrogenase yes LSPSII LSPS4 LSPA18 yes
5 S01 97214 98497 - RefSeq MAPs_46310 Sodium/calcium exchanger protein yes LSPSII LSPS4 LSPA18 yes
6 S01 98740 99558 - RefSeq MAPs_46300 short chain dehydrogenase yes LSPSII LSPS4 LSPA18 yes
7 S01 99555 99971 - RefSeq MAPs_46290 SnoaL-like polyketide cyclase yes LSPSII LSPS4 LSPA18 yes
8 S01 99968 101011 - RefSeq MAPs_46280 Alcohol dehydrogenase GroES-like domain yes LSPSII LSPA18 no
9 S01 101135 102001 + RefSeq MAPs_46270 IclR helix-turn-helix domain yes LSPSII LSPS2 LSPA18 yes
10 S01** 102823
103186
103120
103668
+
+
BacProt MAPs_46242***
MAPs_46241***
short-chain type dehydrogenase/reductase yes
yes
LSPSII
LSPSII
LSPS2
LSPS2
LSPA18
LSPA18
yes
yes
11 S01 104130 104873 + RefSeq MAPs_46220 Polyketide cyclase / dehydrase and lipid transport yes LSPSII LSPS2 LSPA18 yes
12 S01 104889 105635 - RefSeq MAPs_46210 Methyltransferase domain yes LSPSII LSPS2 LSPA18 yes
13 S01 105761 107146 - RefSeq MAPs_46200 Adenylate and Guanylate cyclase catalytic domain yes LSPSII LSPS2 LSPA18 yes
14 S01 105779 107215 + BacProt Predicted_22 lignin peroxidase LIPJ yes LSPSII LSPS2 LSPA18 not annotated
15 S01 107188 107781 - RefSeq MAPs_46190 Bacterial regulatory proteins tetR family yes LSPSII LSPS2 LSPA18 yes
16 S01 108998 109762 + RefSeq MAPs_46170 short chain dehydrogenase yes LSPSII LSPA18 yes
17 S02 93165 94226 + RefSeq MAPs_17580 UDP-glucose 4-epimerase GalE1 yes LSPSIIIa LSPS5 GPL yes
18 S02 94447 95553 + RefSeq MAPs_17590*** MtfA protein yes LSPSIIIa LSPS5 GPL yes
19 S02 95805 96632 + RefSeq MAPs_17610 MtfB protein yes LSPSIIIa LSPS5 GPL yes
20 S02 96755 98032 + RefSeq MAPs_17621 Glycosyltransferase family 28 N-terminal domain yes LSPSIIIa GPL no
21 S02 98531 99507 + RefSeq MAPs_17622 UDP-glycosyltransferase yes LSPSIIIa GPL no
22 S02 99825 100625 + RefSeq MAPs_17650 methyltransferase MtfC yes LSPSIIIa LSPS7 GPL yes
23 S02 100740 101543 + RefSeq MAPs_17660 MtfD protein yes LSPSIIIa LSPS7 GPL yes
24 S02 101617 102396 + RefSeq MAPs_17670 dehydrogenase DhgA yes LSPSIIIa LSPS7 GPL yes
25 S02 104516 105313 - BacProt MAPs_17690*** hlpA, hemolytic protein no LSPSIIIb yes
26 S02 449051 450112 - RefSeq MAPs_16180 phospho-2-dehydro-3-deoxyheptonate aldolase yes LSPSIb LSPA-II yes
27 S02 450527 451864 - RefSeq MAPs_16170 Polymorphic PE/PPE proteins C terminal yes LSPSIb LSPA-II no
28 S02 446481 450703 - BacProt Predicted_900 phenyloxazoline synthase yes LSPSIb LSPA-II not annotated
29 S02 451054 460237 + BacProt Predicted_901 PPE family protein yes LSPSIb LSPA-II not annotated
30 S02 452918 454309 + RefSeq MAPs_16160 Condensation domain yes LSPSIb LSPA-II no
31 S02 454343 455989 + RefSeq MAPs_16150 ABC-2 type transporter yes LSPSIb LSPA-II no
32 S02 459341 460672 + RefSeq MAPs_16120 Polymorphic PE/PPE proteins C terminal yes LSPSIb LSPA-II no
33 S02 461079 461270 - RefSeq MAPs_16110 Protein of unknown function (DUF1271);ferredoxin yes LSPSIb LSPA-II yes
34 S02 461606 462685 - RefSeq MAPs_16100 diguanylate cyclase yes LSPSIb LSPA-II yes
35 S02 462966 463340 - RefSeq MAPs_16090 MerR family regulatory protein yes LSPSIb LSPA-II no
36 S02 464318 464734 - RefSeq MAPs_16070 Hsp20/alpha crystallin family yes LSPSIb LSPA-II no
37 S02 464656 464752 - BacProt Predicted_904 18 kDa antigen 2 yes LSPSIb LSPA-II not annotated
38 S02 464963 466483 + RefSeq MAPs_16060 regulator of polyketide synthase expression yes LSPSIb LSPS1 LSPA-II yes
39 S02 466636 467268 + RefSeq MAPs_16050*** transmembrane protein yes LSPSIb LSPS1 LSPA-II yes
40 S02 467647 468180 + RefSeq MAPs_16040 transcription elongation factor GreA yes LSPSIb LSPS1 LSPA-II yes
41 S02 468177 468644 + RefSeq MAPs_16030 transcription elongation factor GreA yes LSPSIb LSPS1 LSPA-II yes
42 S02 468938 469252 - RefSeq MAPs_16010*** topology modulation protein yes LSPSIb LSPS1 LSPA-II yes
43 S02 469605 470291 + RefSeq MAPs_15990 ThiJ/PfpI family protein yes LSPSIb LSPS1 LSPA-II yes
44 S02 470902 471495 - RefSeq MAPs_15980 TetR family transcriptional regulator yes LSPSIb LSPS1 LSPA-II yes
45 S02** 471492
471928
471956
472342
-
-
RefSeq
RefSeq
MAPs_15962
MAPs_15961***
dehydrogenase/decarboxylase protein yes
yes
LSPSIb LSPS1
LSPS1
LSPA-II yes
yes
46 S02 472674 473384 - RefSeq MAPs_15950 Bacterial regulatory proteins, tetR family;transcriptional regulator no LSPSIa LSPS1 yes
47 S02 473641 473973 - RefSeq MAPs_15940 Pyridoxamine 5'-phosphate oxidase;Pyridoxamine 5'-phosphate oxidase no LSPSIa LSPS1 yes
48 S02 475498 476766 - RefSeq MAPs_15920 Cytochrome P450 no LSPSIa no
49 S02 476791 477450 - RefSeq MAPs_15910 Bacterial regulatory proteins, tetR family;transcriptional regulator, tetR family no LSPSIa no
50 S02 477513 478715 - RefSeq MAPs_15900 acetyl-CoA acetyltransferase no LSPSIa no
51 S02 478712 479125 - RefSeq MAPs_15890 Rubredoxin-like zinc ribbon domain (DUF35_N) DUF35 OB-fold domain;putative nucleic-acid-binding protein containing a Zn-ribbon no LSPSIa yes
52 S02 479122 480333 - RefSeq MAPs_15880* CoA-transferase family III no LSPSIa no
53 S02 480334 481386 - RefSeq MAPs_15870 Phosphtransferase enzyme family no LSPSIa yes
54 S04 129726 169510 + BacProt Predicted_574 Type I modular polyketide synthase yes not annotated
55 S04 569778 569987 - RefSeq MAPs_20770 Excalibur calcium-binding domain yes LSPSIV MAV-14 yes
56 S04 571365 572126 - RefSeq MAPs_20740 short chain dehydrogenase yes LSPSIV MAV-14 no
57 S04 572123 574495 - RefSeq MAPs_20730 CoA-transferase family III yes LSPSIV MAV-14 no
58 S04 574515 575483 - RefSeq MAPs_20720 Enoyl-CoA hydratase/isomerase family yes LSPSIV MAV-14 yes
59 S04* 575520 576356 - RefSeq MAPs_20710 carveol dehydrogenase yes LSPSIV MAV-14 no
60 S04 577782 578456 - BacProt MAPs_20690 TetR family transcriptional regulator yes LSPSIV MAV-14 no
61 S04 578590 579198 + RefSeq MAPs_20680*** protein export protein SecD, putative yes LSPSIV MAV-14 yes
62 S04 579233 580243 - RefSeq MAPs_20670 NAD binding oxidoreductase yes LSPSIV MAV-14 yes
63 S04 580377 581042 + RefSeq MAPs_20660 Methyltransferase domain yes LSPSIV MAV-14 yes
64 S04 581063 582664 - RefSeq MAPs_20650 Sulfatase yes LSPSIV MAV-14 no
65 S04 582771 583790 + RefSeq MAPs_20640 2-nitropropane dioxygenase yes LSPSIV MAV-14 no
66 S04 583795 584616 - RefSeq MAPs_20630 Leucine carboxyl methyltransferase yes LSPSIV MAV-14 no
67 S04 584613 585140 - RefSeq MAPs_20620 MarR family yes LSPSIV yes
68 S04 586637 587410 - RefSeq MAPs_20590 Class II Aldolase and Adducin N-terminal domain yes LSPSIV yes
69 S04 587397 589023 - RefSeq MAPs_20580 AMP-binding enzyme;acyl-CoA synthetase (AMP-forming)/AMP-acid ligase II yes LSPSIV no
70 S04 589090 589608 - RefSeq MAPs_20570*** transmembrane protein yes LSPSIV yes
71 S04 589601 590878 - RefSeq MAPs_20560 transmembrane protein yes LSPSIV yes
72 S05 196001 196975 + RefSeq MAPs_44900* Luciferase-like monooxygenase;flavin-dependent oxidoreductase, methylene-tetrahydromethanopterin reductase yes no
73 S05 196987 197643 - RefSeq MAPs_44910 GntR family transcriptional regulator yes yes
74 S05 197705 199033 + RefSeq MAPs_44920* nitrilotriacetate monooxygenase component A yes no
75 S05 225619 227355 - RefSeq MAPs_00250 cholesterol oxidase yes no
76 S05 340103 341050 + RefSeq MAPs_02750 diguanylate cyclase yes LSPS8 partly in MAP-C
77 S05 341668 342150 + RefSeq MAPs_02730 glutathione peroxidase yes LSPS8 yes
78 S05 342175 342948 - RefSeq MAPs_02720 immunogenic protein MPT64 yes no
79 S05 539884 540773 - RefSeq MAPs_00670 Taurine catabolism dioxygenase TauD yes no
80 S05 540848 541423 + RefSeq MAPs_00680 transcriptional regulator yes partly in MAP-C



S18: Single nucleotide variants (SNVs) present in different strains

back to top

Numbers of SNVs observed during pairwise comparisons of CDS (protein coding genes with assigned function) of all investigated genomes (based on merged annotations). The lower number of SNVs (compared to other MAP-S strains) of CLIJ361 against MAP-C strains is also a result of the lower number of completely annotated genes in this assembly, which could be used for SNV analysis.

#SNVs = #non-syn SNVs + #syn SNVs
K-10K-10'MAP4JII-1961JIII-386S397CLIJ361104
K-10 20=14+6 129=78+51 138=95+43 2576=1728+848 1974=1326+648 1416=952+464 23172=6868+16304
K-10' 153=96+57 152=102+50 2585=1740+845 2274=1536+738 1573=1052+521 25118=7419+17699
MAP4 179=114+65 2232=1504+728 2079=1387+692 1676=1123+553 25139=7435+17704
JII-1961 2212=1495+717 2059=1385+674 1557=1057+500 25071=7409+17662
JIII-386 1044=685+359 972=624+348 25230=8146+17084
S397 783=505+278 26082=7969+18113
CLIJ361 17577=4960+12617
104


S19: Non-coding RNAs

back to top

Download: ODS

ncRNA FASTA STK MAP K-10 MAP K-10' MAP MAP4 MAP JII-1961 MAP JIII-386 MAP S397 MAP CLIJ361 MAH 104
FASTA GFF FASTA GFF FASTA GFF FASTA GFF FASTA GFF FASTA GFF FASTA GFF FASTA GFF
COPY COPY COPY COPY COPY COPY COPY COPY
5S rRNA FASTA STK 1 1 1 1 1 1 1 1
SSU rRNA FASTA STK 1 1 1 1 1 1 1 1
LSU rRNA FASTA --- 1 1 1 1 1 1 1 1
RNase P FASTA STK 1 1 1 1 1 1 1 1
tmRNA FASTA STK 1 1 1 1 1 1 1 1
Bacteria small SRP FASTA STK 1 1 1 1 1 1 1 1
PyrR* FASTA STK 1 1 1 1 1 1 1 1
6C* FASTA STK 1 1 1 1 1 1 0 1
Actino-pnp* FASTA STK 1 1 1 1 1 1 1 1
mraW* FASTA STK 2 2 2 2 2 2 2 2
ASdes* FASTA STK 3 3 3 3 3 3 3 3
ASpks* FASTA STK 4 4 4 4 3 3 3 4
F6* FASTA STK 1 1 1 1 1 1 1 1
G2* FASTA STK 0 0 0 0 1? 1? 1? 1?
AS1890* FASTA STK 1? 1? 1? 1? 1? 1? 1? 1?
Riboswitches
TPP FASTA STK 2 2 2 2 2 2 1 2
Cobalamin FASTA STK 2 2 2 2 2 2 2 2
Glycine* FASTA STK 1 1 1 1 1 1 1 1
SAM-IV* FASTA STK 1 1 1 1 1 1 1 1
SAH* FASTA STK 1 1 1 1 1 1 1 2
ydaO-yuaA* FASTA STK 1 1 1 1 1 1 1 1
ykoK* FASTA STK 3 3 3 3 3 3 3 3
ykkC-yxkD* FASTA STK 1 1 1 1 1 1 1 1
ykkC-III* FASTA STK 1 1 1 1 2 2 2 2
pfl* FASTA STK 1 1 1 1 1 1 1 0
pan* FASTA STK 1 1 1 1 1 1 1 1

* not found in the draft analysis of E. coli or S. entericus, see Table S21.



S20: Non-coding RNAs in E. coli and S. entericus

back to top

Table: pdf



S21: tRNAs

back to top

Download: ODS

Organism FASTA GFF tRNA tRNA Ala CGC tRNA Ala GGC tRNA Ala TGC tRNA Arg ACG tRNA Arg CCG tRNA Arg CCT tRNA Arg TCT tRNA Asn GTT tRNA Asp GTC tRNA Cys GCA tRNA Gln CTG tRNA Gln TTG tRNA Glu CTC tRNA Glu TTC tRNA Gly CCC tRNA Gly GCC tRNA Gly TCC tRNA His GTG tRNA Ile GAT tRNA Leu CAA tRNA Leu CAG tRNA Leu GAG tRNA Leu TAA tRNA Leu TAG tRNA Lys CTT tRNA Lys TTT tRNA Met CAT tRNA Phe GAA tRNA Pro CGG tRNA Pro GGG tRNA Pro TGG tRNA SeC(p) TCA tRNA Ser CGA tRNA Ser GCT tRNA Ser GGA tRNA Ser TGA tRNA Thr CGT tRNA Thr GGT tRNA Thr TGT tRNA Trp CCA tRNA Tyr GTA tRNA Val CAC tRNA Val GAC tRNA Val TAC
COPY COPY COPY COPY COPY COPY COPY COPY COPY COPY COPY COPY COPY COPY COPY COPY COPY COPY COPY COPY COPY COPY COPY COPY COPY COPY COPY COPY COPY COPY COPY COPY COPY COPY COPY COPY COPY COPY COPY COPY COPY COPY COPY COPY COPY
MAP K‑10 FASTA GFF 46 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
MAP K‑10' FASTA GFF 46 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
MAP4 FASTA GFF 46 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
MAP JII‑1961 FASTA GFF 46 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
MAP JIII‑386 FASTA GFF 46 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
MAP S397 FASTA GFF 44 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
MAP CLIJ361 FASTA GFF 46 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
MAH 104 FASTA GFF 46 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


Phylogenetic trees

back to top

S22: 790 CDS (nucleotide)

Download: pdf tree | svg tree | newick tree | mafft alignment (fasta)







S23: 790 CDS (amino acid)

Download: pdf tree | svg tree | newick tree | mafft alignment (fasta)







S24: 70 ncRNAs

Download: pdf tree | svg tree | newick tree | mafft alignment (fasta)







S25: 790 CDS (nucleotide) + M. intracellulare strains MOTT-02/MOTT-64

M. intracellular genomes of strains MOTT-02 (NC_016947.1) and MOTT-64 (NC_016948.1) were downloaded from NCBI RefSeq.

Download: pdf tree | svg tree | newick tree | mafft alignment (fasta)







S26: 790 CDS (amino acid) + M. intracellulare strains MOTT-02/MOTT-64

Download: pdf tree | svg tree | newick tree | mafft alignment (fasta)






Scripts and used commands

back to top

Some of the mainly used commands are listed.

Blast

makeblastdb -in $fasta -dbtype nucl -parse_seqids

blastn+ -num_threads 40 -query $query -db $db -evalue 1e-10|1e-4 -outfmt "6 qseqid sseqid pident length mismatch gapopen qstart qend qlen sstart send evalue bitscore slen" -out out.b6

Mafft

mafft --thread 40 $fasta > $aln 2>> $log

mafft-linsi --thread 40 $fasta > $aln 2>> $log

RAxML

raxmlHPC-PTHREADS-SSE3 -T 6 -f a -# 100 -x 1234 -p 1234 -o $outgroup -s $aln -n $praefix -m GTRGAMMA|PROTGAMMAWAG -N 1000

Newick Utilities

nw_display -v 30 -i 'font-size:11' -l 'font-size:12;font-family:helvetica;font-style:italic' -o ornament.map -c css.map -Il -w 500000 -s $newick_tree > $svg

Scripts: zip