The Evolution of COP9 signalosome in unicellular and multicellular organisms

Supplemental Material

Emanuel Barth, Ron Hübler, Aria Baniahmad, and Manja Marz



Contents


Data


COP9 subunit queries and genomes

Species CSN1 CSN2 CSN3 CSN4 CSN5 CSN6 CSN7 CSN8 CSNAP Genomes
Arabidopsis thaliana fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff ath
Aspergillus fumigatus fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff
afu
Bos taurus fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff bta
Caenorhabditis elegans fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff


cel
Danio rerio fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff dre
Dictyostellium discoideum fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff
ddi
Drosophila melanogaster fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff dme
Gallus gallus fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff gga
Homo sapiens fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff hsa
Monodelphis domestica fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff
mdo
Mus musculus fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff mmu
Neurospora crassa fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff

ncr
Schistosoma mansoni fasta gff fasta gff
fasta gff fasta gff fasta gff


sma
Schizosaccharomyces pombe fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff

spo
Xenopus tropicalis fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff xtr
Zea mays fasta gff

fasta gff fasta gff fasta gff fasta gff fasta gff
zma
Supplement Table 1 - All csn gene sequences and positions from the 16 species used as initial search queries, as well as their genome informations.

RNA-Seq data

Probe Platform Layout SRA Link
Human cerevical cancer (HeLa) Illumina HiSeq 2500 paired end http://www.ncbi.nlm.nih.gov/sra/SRX528202
Universal Reference RNA sample FG031 Illumina HiSeq 2000 paired end http://www.ncbi.nlm.nih.gov/sra/SRX523772
Universal Reference RNA sample FG031 Illumina HiSeq 2000 paired end http://www.ncbi.nlm.nih.gov/sra/SRX523771
Peripheral blood cell line with AML (Kasumi-1) Illumina HiSeq 2000 paired end http://www.ncbi.nlm.nih.gov/sra/DRX011550
H1-derived neuronal precursor cells Illumina HiSeq 2500 paired end http://www.ncbi.nlm.nih.gov/sra/SRX516741
Human neurons after cortical stroke Illumina HiSeq 2000 paired end http://www.ncbi.nlm.nih.gov/sra/SRX501850
Hepatocellular carcinoma cell line (HKCI-1) Illumina HiSeq 2000 paired end http://www.ncbi.nlm.nih.gov/sra/SRX290657
neurons derived from reprogrammed pluripotent dental pulp stem cells Illumina HiSeq 2000 paired end http://www.ncbi.nlm.nih.gov/sra/SRX212594
Supplement Table 2 - General information and source of the used RNA-Seq data sets.

Results


predicted COP9 sequences

Species CSN1 CSN2 CSN3 CSN4 CSN5 CSN6 CSN7 CSN8 CSNAP Genome
Acanthamoeba castellanii fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff aca
Acropora palmata



fasta gff


fasta gff apa
Allomyces macrogynus fasta gff fasta gff fasta gff fasta gff fasta gff
fasta gff
fasta gff ama
Amphimedon queenslandica fasta gff fasta gff
fasta gff fasta gff fasta gff fasta gff
fasta gff aqu
Aspergillus fumigatus







fasta gff aqu
Aureococcus anophagefferens fasta gff

fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff aan
Babesia bigemina








bbi
Batrachochytrium dendrobatidis fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff

bde
Branchiostoma floridae fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff
bfl
Candida albicans



fasta gff



cal
Chlamydomonas reinhardtii fasta gff fasta gff fasta gff
fasta gff fasta gff fasta gff

cre
Chondrus crispus fasta gff fasta gff
fasta gff fasta gff fasta gff
fasta gff
ccr
Cryptosporidium parvum








cpa
Cyanidioschyzon merolae








cme
Dictyostelium discoideum







fasta gff cme
Ectocarpus siliculosus fasta gff fasta gff
fasta gff

fasta gff

esi
Emiliania huxleyi fasta gff fasta gff
fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff ehu
Encephalitozoon cuniculi

fasta gff





ecu
Entamoeba histolytica
fasta gff

fasta gff


fasta gff ehi
Giardia intestinalis








gin
Hyaloperonospora parasitica fasta gff fasta gff
fasta gff fasta gff fasta gff
fasta gff fasta gff hpa
Hydra magnipapillata
fasta gff fasta gff fasta gff fasta gff
fasta gff fasta gff
hma
Lichtheimia hyalospora fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff
fasta gff lhy
Micromonas pusilla fasta gff fasta gff
fasta gff fasta gff fasta gff fasta gff fasta gff
mpu
Monodelphis domestica







fasta gff mdo
Monosiga brevicollis
fasta gff
fasta gff fasta gff fasta gff fasta gff
fasta gff mbr
Naegleria gruberi fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff ngr
Nematostella vectensis fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff
fasta gff fasta gff nve
Neurospora crassa







fasta gff mbr
Ostreococcus lucimarinus fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff

olu
Phaeodactylum tricornutum
fasta gff
fasta gff fasta gff


fasta gff ptr
Phycomyces blakesleeanus fasta gff fasta gff
fasta gff fasta gff fasta gff fasta gff

pbl
Physarum polycephalum
fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff
fasta gff ppo
Physcomitrella patens fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff
ppa
Phytophthora infestans fasta gff fasta gff
fasta gff fasta gff fasta gff fasta gff fasta gff
pin
Phytophthora ramorum fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff pra
Phytophthora sojae fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff
fasta gff pso
Plasmodium falciparum








pfa
Rhizopus oryzae fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff

ror
Saccharomyces cerevisiae








sce
Schistosoma mansoni







fasta gff sma
Schizosaccharomyces pombe







fasta gff spo
Selaginella moellendorffii fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff smo
Spizellomyces punctatus fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff
spn
Strongylocentrotus purpuratus fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff fasta gff spu
Tetrahymena thermophila
fasta gff
fasta gff fasta gff



tth
Toxoplasma gondii








tgo
Trichomonas vaginalis



fasta gff


fasta gff tva
Trichoplax adhaerens fasta gff fasta gff
fasta gff fasta gff
fasta gff
fasta gff tad
Trypanosoma brucei
fasta gff






tbr
Volvox carteri
fasta gff
fasta gff fasta gff fasta gff fasta gff

vca
Zea mays
fasta gff fasta gff




fasta gff zma
Supplement Table 3 - All csn gene sequences and positions predicted in 46 species, as well as their genome informations.

Alignments

Subunit uncolored Alignment colored Alignment Alignment file
CSN1 pdf pdf aln
CSN2 pdf pdf aln
CSN3 pdf pdf aln
CSN4 pdf pdf aln
CSN5 pdf pdf aln
CSN6 pdf pdf aln
CSN7 pdf pdf aln
CSN8 pdf pdf aln
CSNAP (restrictive candidates) pdf pdf aln
CSNAP (all putative candidates) pdf pdf aln
Supplement Table 4 - Alignments of each single COP9 subunit. Colored boxes in the alignments represent the coded part of each exon regarded to the human exon structure. Alignments were done using Clustalw (version 2.1) [1] and the pdf files with Genious (version 6.1.8) [2]

Predicted exon structures

Predicted exon structures of CSN2 and CSN5 for different eukaryotic species.

Supplement Figure 1 - Predicted exon structures of CSN2 and CSN5 for different eukaryotic species.

Prediction validation

Species CSN1 CSN2 CSN3 CSN4 CSN5 CSN6 CSN7 CSN8 CSNAP
Aspergillus fumigatus 84.03% 70.44% 30.18% 96.16% 90.72% 35.64% 69.16% 0.00%
Bos taurus 98.95% 99.33% 94.09% 99.01% 99.70% 98.47% 98.86% 100.0% 94.74%
Caenorhabditis elegans 78.87% 89.19% 0.00% 84.52% 88.04% 68.79%


Danio rerio 100.0% 98.19% 78.37% 99.02% 99.70% 42.54% 87.59% 100.0% 89.47%
Dictyostellium discoideum 86.03% 100.0% 95.22% 91.35% 100.0% 92.63% 75.68% 86.22%
Gallus gallus 96.64% 97.68% 98.48% 98.07% 99.63%
99.25% 96.46% 94.74%
Monodelphis domestica 84.71% 95.49% 89.59% 99.47% 97.44% 98.17% 99.62% 98.74%
Mus musculus 91.45% 96.41% 99.48% 99.16% 98.95% 98.62% 100.0% 86.54% 94.74%
Neurospora crassa 81.88% 89.59% 74.85% 89.09% 100.0% 46.68% 55.40%

Schistosoma mansoni 29.25% 95.95%
23.80% 64.43% 78.46%


Schizosaccharomyces pombe 79.62% 86.27% 51.91% 78.68% 71.57% 63.76% 54.90%

Xenopus tropicalis 91.79% 94.36% 91.25% 87.68% 91.57% 76.80% 91.67% 42.49% 92.98%
Zea mays 92.81% no ref* no ref* 94.49% 99.17% 90.94% 87.02% 98.47%
Supplement Table 5 - Identity in percentage of the different COP9 subunit sequences that could be identfied with the prediction strategy described in the methods section. As queries, the sequence informations of H. sapiens, A. thaliana and D. melanogaster were used. Predicted sequences were compared with the already known sequences from the Ensemble database.
no ref* - no reference for comparison was available at the Ensemble database

CSN expression in unicellular species

Species Subunits SRA Link
Aureococcus anophagefferens CSN1

CSN4 CSN5 CSN6 CSN7 CSN8
SRR1300278 SRR1300279
Micromonas pusilla CSN1 CSN2
CSN4 CSN5 CSN6 CSN7 CSN8
SRR1300457
Ostreococcus lucimarinus CSN1 CSN2 CSN3 CSN4 CSN5 CSN6 CSN7

SRR1300254
Phaeodactylum tricornutum
CSN2
CSN4 CSN5



SRR1138237 SRR1138238
Supplement Table 6 - For the four unicellular organisms A. anophagefferens, M. pusilla, O. lucimarinus and P. tricornutum RNA-Seq data sets were downloaded from the NCBI Sequence Read Archive and mapped with TopHat (version 2.0.11) [3]. Then the Integrated Genomics Viewer (version 2.3.39) [4] was used to examine the locations of the predicted CSN subunits for these four species.

References


[1] Thompson, J. D., Gibson, T., and Higgins, D. G. (2002). Multiple sequence alignment using ClustalW and ClustalX. Current protocols in bioinformatics, 2-3.

[2] Kearse, M., Moir, R., Wilson, A., Stones-Havas, S., Cheung, M., Sturrock, S., Buxton, S., Cooper, A., Markowitz, S., Duran, C. and others (2012). Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics, 28(12): 1647-1649.

[3] Kim, D., Pertea, G., Trapnell, C., Pimentel, H., Kelley, R., and Salzberg, S. L. (2013). TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol, 14(4), R36.

[4] Thorvaldsdóttir, H., Robinson, J. T., & Mesirov, J. P. (2012). Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Briefings in bioinformatics, bbs017.