Supplementary Figure 1. Design of the control microarray. a, Genomic DNA from the

Size: px
Start display at page:

Download "Supplementary Figure 1. Design of the control microarray. a, Genomic DNA from the"

Transcription

1 Supplementary Information Supplementary Figures Supplementary Figure 1. Design of the control microarray. a, Genomic DNA from the strain M8 of S. ruber and a fosmid containing the S. ruber M8 virus M8CR4 were used as probes and spotted in the array from 5 different concentration solutions (each group of 5 gray and black circles inside the boxes have the same DNA amount). b, Hybridization between the control microarray and genomic DNA from a non-infected S. ruber M8, used as the target. c, Hybridization between the control microarray and total DNA from a S. ruberm8 culture infected with virus M8CR4, used as the target. 1

2 Supplementary Figure 2. Phylogenetic analyses of partial 16S rrna gene sequences from SAGs belonging to Nanohaloarchaea group. The neighbor-joining consensus tree was built with a bootstrap value of Bootstrap values 50 are displayed. The positivedetected infected cell is indicated (red arrow). 16S rrna gene sequence similarity with Nanosalinarum J07AB56 was 98%. PCR of 16S rrna gene from SAGs was carried out with primers 27F (AGRGTTYGATYMTGGCTCAG) and 907R (CCG TCA ATT CMT TTR AGT TT). 2

3 Supplementary Figure 3. Genomic island signature in ORF 8 of the virus infecting the nanohalarchaeon SAG D14. a, A total of 2685 reads recovered from the infected cell D14 were found to match to the nucleotide position region of the cloned virus with an average nucleotide identity of 77% with a total of 11 SNPs, indicating a region of high variability of the virus and a probable signature to differentiate variants of the virus. b, Protein alignment of ORF 8 for two viral variants indicating the three non-synonymous substitutions at positions 71, 77 and 80. 3

4 Supplementary Figure 4. Sequencing coverage of the genome SAG D14. An artificial concatenation of contigs was performed. Identity threshold of 95% and minimum read overlap of 50 bp were applied. 4

5 Supplementary Figure 5. Genomic comparison of nanohaloarchaeon host D14 with the closest relative nanohaloarchaeon Candidatus Nanosalinarum spp. a, Average nucleotide identity (ANI) values were performed with JSpecies package as described in Method (section Metagenome recruitment and genome analysis). ANI between both nanohaloarchaea was 80.5%. Large contigs of nanohaloarchaeon SAG AB578-D14 genome were artificially sectioned into 1,020 nucleotide fragments resulting in a total of 1552 fragments that were compared by BLAST against the reference genome of Candidatus Nanosalinarum spp. (J07AB56). b, Tetranucleotide frequency signature correlation for the pairwise genome comparison. The average tetranucleotide frequency between the two Nanohaolarchaea was

6 Supplementary Figure 6. Genome annotation of nanohaloarchaeon host D14. Annotation was performed in SEED subsystem (RAST) and Integrated Microbial Genomes (IMG) of JGI. CDs previously annotated as hypothetical proteins were assigned to conserved proteins (a total of 198) when identity threshold and coverage values were 70% with Candidatus Nanosalinarum J07AB56. Supplementary Figure 7. Distribution of CDs annotated as hypothetical proteins in SEED subsystem according to contig size for SAG D14 genome assembly. Hypothetical proteins were overrepresented in contig>1000 bp, which might indicate that hypothetical proteins in SAG D14 might be overestimated as a result of genome fragmentation likely due to MDA bias and viral infection. 6

7 Supplementary Figure 8. Metagenome analysis of annotated hypothetical proteins (HP) of host D14. Annotated HP of host D14 were searched and compared with the cellular metagenome from the same site (crystallizer CR30). Over 60% of total annotated CDs as HP genes in host D14 were found in the corresponding CR30 cellular metagenome displaying high sequence identities. 7

8 Supplementary Figure 9. Principal component analyses of the tetranucleotide frequency signatures of viruses and hosts. Frequency of tetranucleotide signatures was calculated and plotted in a PCA plot for reference halophilic prokaryote genomes, the pair NHV-1- Nanohaloarchaon host D14 and the viral contigs from environmental uncultured halophages ( ehp-number and Contig_number ) from García-Heredia et al. 1 and Santos et al. 2 respectively. Viral and host genomes are represented by spheres and stars, respectively. 8

9 Supplementary Tables Supplementary Table 1: IlluminaMiSeq sequencing results for the SAG AB578-D14 and the corresponding viral fosmid C23 No. of total reads Quality score (% of total reads) Q30 Q20 Mean read length (bp) Host AB578-D14 8,330, Virus (fosmid insert) 11,911, Supplementary Table 2: BLASTn hits of the putative viral asrr-like gene of virus NHV-1 against the assembly-driven community genomic dataset from the Lake Tyrrell from Podell et al. 3 Pairwise Contig name identity Bit-Score E-value Hit start Hit end gi gb KB E gi gb KB E gi gb KB E gi gb KB E gi gb KB E gi gb KB E Supplementary Table 3: Assembly comparison of performance of SPAdes 4, CAMERA 5 meta-assembler+geneious R6.1 6 and VELVET 7 assemblers (36-38) for the nanohaloarchaeon host D14 Assembler No. total contigs No. contigs>1000 bp Total nucleotide assembled Max. length contig (bp) NG50 a value SPAdes N50 b CAMERA Meta-assembler + Geneious R6.1 assembler < VELVET < a Since assembly sizes from the three strategies was very uneven (0.2-1 Mbp), the NG50 statistics was used to compare the three resulting assemblies. The NG50 statistic is the same as the N50 except that the genome size was used rather than the assembly size. Genome size used here for normalization was that from SPAdes. b Only contigs of 500 bp and longer were taken in consideration for N50 estimation 9

10 Supplementary References 1. Garcia-Heredia, I. et al. Reconstructing Viral Genomes from the Environment Using Fosmid Clones: The Case of Haloviruses. PLoS One 7, e33802 (2012). 2. Santos, F., Yarza, P., Parro, V., Briones, C. & Antón, J. The metavirome of a hypersaline environment. Environ. Microbiol. 12, (2010). 3. Podell, S. et al. Assembly-driven community genomics of a hypersaline microbial ecosystem. PLoS One 8, e61692 (2013). 4. Bankevich, A. et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19, (2012). 5. Sun, S. et al. Community cyberinfrastructure for Advanced Microbial Ecology Research and Analysis: the CAMERA resource. Nucleic. Acids. Res. 39, D (2011). 6. Kearse, M. et al. Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics 28, (2012). 7. Zerbino, D. R. & Birney, E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18, (2008). 10