Complete Genome Structure Of Mesorhizobium Loti Strain Maff303099

T. Kaneko, Y. Nakamura, S. Sato, E. Asamizu, T. Kato, S. Tabata

Kazusa DNA Research Institute, 1532-3 Yana, Kisarazu, Chiba 292-0812, Japan

1. Introduction

Mesorhizobium loti is a member of the rhizobia which performs nitrogen fixation on several Lotus species in determinant-type globular nodules. Nodule formation and nitrogen fixation result from interactions between symbiotic bacteria and host plants by the sequential expression of a series of genes from both bacteria and hosts. To understand the genetic systems required for the entire process of symbiotic nitrogen fixation, we have initiated the genome sequencing of M. loti strain MAFF303099. Here, we report the complete structure of the M. loti genome, which consists of a single chromosome and two large plasmids, and gene complements of both the chromosome and the plasmids. Characteristic features of the genes and the genome will also be presented.

2. Materials and Methods

2.1. Genome sequencing. Mesorhizobium loti strain MAFF303099 was obtained from the Genetic Resource Center, National Institute of Agrobiological Resources, Ministry of Agriculture, Forestry and Fisheries. The modified whole-genome shotgun strategy was adopted to determine the genome structure as previously described (Kaneko et al. 2000). A total of 90,706 sequence files corresponding to about 8-times genome equivalent were accumulated and subjected to assembly using the Phrap program (Phil Green, Univ. of Washington, Seattle, USA).

2.2. Gene assignment and annotation. Protein coding regions were assigned by a combination of computer prediction and similarity search as described previously (Kaneko et al. 2000). A computer program, Glimmer, was used for the prediction of protein-coding regions (Delcher et al. 1999). Genes for structural RNAs were assigned by similarity search against the structural RNA database in-house. Prediction by the tRNA scan-SE program was performed to assign tRNA-coding regions in combination with the similarity search (Lowe et al. 1997). Function of the predicted genes was assigned by the similarity to genes of known function. A BLAST score of less than e-20 was taken into account for genes encoding proteins of 100 amino acid residues or longer. Higher e-values were taken for genes encoding smaller proteins.

3. Sequence Determination of the Entire Genome

The genome sequence of M. loti strain MAFF303099 was deduced by assembly of 90,706 sequence files corresponding to approximately 8-times genome equivalent. Integrity of the final sequence was confirmed by comparing the distance of the end sequences of each of 361 cosmid clones on the generated sequence and its insert length for the entire genome. The length of the genome thus deduced was 7,596,297 bp long, which consists of three circular molecules, a chromosome of 7,036,071 bp and two plasmids, designated as pMLa and pMLb, of 351,911 bp and 208,315 bp, respectively. Overall GC contents of the chromosome and two plasmids, pMLa and pMLb, were 62.7%, 59.3% and 59.9%, respectively.

4. Assignment of Protein- and RNA-coding Genes

By taking the results of computer prediction and sequence similarity to known genes into account, the total number of the potential protein-coding genes finally assigned to the chromosome was 6,752 (Table 1). Two plasmids, pMLa and pMLb, had the capacity of coding for 320 and 209 proteins.

The putative protein-coding genes thus assigned to the genome starting with either ATG, GTG or TTG codon were denoted by serial number with three letters representing the species name (m), ORF longer than or less than 100 codons (1 or s), and the reading direction on the circular map (r or 1).

Two copies of rRNA gene clusters were found on the genome at coordinates of 2,745,482-2,751,894 and 2,752,970-2,759,407. The sequences of two clusters were identical except for two nucleotide residues downstream of trnfM. One gene for the RNA subunit of RNase P was identified. A total of 50 tRNA genes representing 47 tRNA species were assigned to the chromosome by sequence similarity to known bacterial tRNA genes and computer prediction using the tRNA scan-SE program. No RNA coding genes were found on the plasmid genomes. It should be remembered that the genes assigned merely represent the coding potentiality of proteins and RNAs under the defined assumptions.

Table 1. Features of the assigned protein-coding genes and the functional classification.

Chromosome

%

pMLa

%

pMLb

%

Amino acid biosynthesis

177(2.6)

10

3.1

2

2

Biosynthesis of cofactors, prosthetic groups and carriers

145

2.1

14

4.4

1

1

Cell envelope

110

1.6

3

0.9

1

1

Cellular processes

176

2.6

14

4.4

16

16

Central intermediary metabolism

120

1.8

1

0.3

0

0

Energy metabolism

326

4.8

7

2.2

7

7

Fatty acid, phospholipid and sterol metabolism

163

2.4

4

1.3

1

1

Purines, pyrimidines, nucleosides and nucleotides

81

1.2

0

0

1

1

Regulatory functions

517

7.7

11

3.4

11

11

DNA replication, recombination and repair

85

1.3

7

2.2

11

11

Transcription

54

0.8

0

0

0

0

Translation

190

2.8

5

1.6

1

1

Transport and binding proteins

717

11

41

13

6

6

Other categories

814

12

43

13

17

17

Subtotal of genes similar to genes of known function

3675

54

160

50

75

36

Similar to hypothetical protein

1423

21

60

19

38

18

Subtotal of genes similar to registered genes

5098

76

220

69

113

54

No similarity

1654

25

100

31

96

46

Total

6752

100

320

100

209

100

5. Functional Assignment of the Protein-coding Genes

Of the 6752 potential protein-coding genes in the chromosome, 3673 (54%) were homologs to genes of known function, 1421 (21%) showed similarity to hypothetical genes, and the remaining 1678 (25%) showed no significant similarity to any registered genes (Table 1). Two plasmid genomes contained a larger number of genes of unknown function, 51% and 65% for pMLa and pMLb, respectively (Table 1). The number of genes in each functional category is summarized in Table 1.

6. Features of the Predicted Genes Characteristic of M. loti

6.1. Symbiotic island. A 610,975 bp DNA segment flanked by portions of phe-tRNA gene sequence on both sides was identified as a probable "symbiotic island" (Sullivan et al. 1998). P4 integrase gene which was located near the end of the symbiotic island in M. loti strain ICMP3153 was also present (mll6432) in MAFF303099. A total of 580 protein-coding genes were assigned to the symbiotic island of MAFF303099 based on computer prediction and similarity search. The Glimmer program often failed to predict the genes of known function in the symbiotic island, suggesting exogenous origin of this DNA segment. As a result, the DNA segment contained 30 genes related to nitrogen fixation and 24 genes for nodulation (Figure 1).

Notable features of the genes in this segment are as follows:

(i) Twelve genes for the conjugal transfer proteins were identified.

(ii) Two gene clusters, each comprised of four genes for biotin synthesis (mll5828-mll5831,mll6003 and mll5004-mll6007), were assigned. Another gene cluster with the same gene set was found in plasmid pMLa. A cluster of genes for thiamine biosynthesis consisting of six genes was also identified (mll5788-mll5795), though the thiG gene seems to be split into two ORFs (mll5790 and msl5792) by a frameshift mutation.

(iii) One hundred eleven out of 580 genes (19.6%) assigned in the symbiotic island were those for transposon-related function such as transposase, integrase, recombinase and resolvase.

(iv) A cluster of genes for type III secretion system were found. Nine genes, hrcN-yAyi-hrcQ-hrcR-hrcS-hrcT-hrcU-yAyq-Hrc V, formed the cluster (mlr6342-mlr6348).

(v) Two hundred and fifty genes showed a high degree of sequence similarity to those in the symbiotic plasmid, pNGR234a, of Rhizobium sp. NGR234 (536 kb) (Freiberg et al. 1997).

4,644,792 bp

5,255,766 bp noeEK nodFE nifA

dctAB D

nodZ-noeLK

nifU fdxB

noeL

nifHDKENX fdxB-nifQ

nodS nodACUO

nodB nodD

nolL-nodD

nolXW nolBTUV

XiiH t phe-tRNA

nolM

fixNOQPGHIS

9 genes for typelll secretion system

17 bp direct-repeat nifSW-fixABCX-nifAB-fdxN-nifZ-fix U Figure 1. Genes for nodulation and nitrogen fixation in the symbiotic island.

6.2 Genes related to nodulation and nitrogen fixation. A total of 39 genes for nodulation were identified on the chromosome, and 24 of them were located in the symbiotic island. Forty-six genes were assigned as those for nitrogen fixation, of which 30 were found in the symbiotic island. Only one homolog of noeC gene for nodulation was present in the plasmid genome (pMLa). Nine genes and gene clusters contained the nod-box like sequences in the upstream. These include nodZ-noeL-nolK (mlr5848-5849-5850), nodS (mlr6161), nolL (mlr6181) and two-component response regulator y4xl (mlr6334).

The sequence and gene information are available in the Web database, RhizoBase, at http://www.kazusa.or.jp/rhizobase/.

7. References

Delcher A et al. (1999) Nucleic Acids Res. 27, 4636-4641

Freiberg C et al. (1997) Nature 387, 394-401

Lowe T et al. (1997) Nucleic Acids Res. 25, 955-964

Sullivan J et al. (1998) Proc. Natl. Acad. Sci. USA 95, 5154-5149

8. Acknowledgements

We thank Drs K. Minamizawa and T. Uchiumi for valuable discussions. This work was supported by the Kazusa DNA Research Institute Foundation.

THE SYMBIOTIC PLASMID OF RHIZOBIUM ETLICFN42

P. Bustos1, M.A. Cevallos1, J. Collado-Vides2, V. González1, A. Medrano2, G. Moreno2,

Programa de Evolución Molecular

2Programa de Biología Computacional

3Programa de Genética Molecular de Plásmidos Bacterianos Centro de Investigación sobre Fijación de Nitrógeno, UNAM., Av. Universidad S/N, Col. Chamilpa, Cuernavaca, Mor., México

The replicón p42d is the symbiotic plasmid (pSym) for the strain CFN42 of Rhizobium etli. This plasmid has all the required information to promote nodulation on beans when it is present in an Agrobacterium tumefaciens strain cured from its endogenous plasmids (S. Brom et al. 1988).

Phaseolus vulgaris the common bean, was originated and diversified in Mesoamerica, hence the genetic pool of R. etli, its symbiont, is also believed to be very large in Mexican fields. Accordingly the genetic diversity for the chromosome of a population of local isolates of Rhizobium etli is one of the largest found within a bacterial species (D. Piñero et al. 1988). Nevertheless at DNA level, the genetic sequences of p42d are highly conserved among the symbiotic plasmids of other strains of the species, independently of their geographical origin of isolation. Therefore, we propose that this plasmid was recently spread among soil bacteria where beans were cultivated. To further support this hypothesis, most of the strains isolated from bean rhizosphere, belong to the species but are devoid of the symbiotic plasmid (L. Segovia et al. 1991).

The genome of Rhizobium etli CFN42 is distributed among 7 replicons, one chromosome of approximately 5 Mb and six plasmids (p42a 0.2 Mb, p42b 0.15 Mb, p42c 0.27 Mb, p42d 0.37 Mb, p42e 0.5 Mb and p42f 0.7 Mb). The estimated number of reiterated sequence families for Rhizobium is 200, with an average of 2.5 elements per family (M. Flores et al. 1987). p42d contains 10 families of internally repetitive sequences, with a number of elements varying from 2 to 6. The sequences taken into account for this analysis are those that span at least 300 bp and had 80% of DNA identity (L. Girard et al. 1991). These reiterated sequences have been shown to participate in genome rearrangements mediated by homologous recombination (D. Romero et al. 1991).

Only with p42a, the symbiotic plasmid has an extensive conservation of reiterated sequences, many of them of the IS type are not present in the rest of the genome. The presence of this kind of sequences shared exclusively between these plasmids suggest that both plasmids arrived recently from a common genomic background. To further support this, p42a, a self-transmissible plasmid, conducts p42d during conjugation experiments (S. Brom, personal communication).

The sequence project for p42d was recently concluded. Its structure is a double stranded circular DNA with 371,256 bp. The plasmid has a RepABC replication system that confers a low copy number, 1 to 2 plasmids per cell, and a very high stability (M.A. Ramirez et al. 1997). The sequencing of the plasmid was carried out with a 373 ABI automatic sequencer with fluorescent ddNTP terminators. Templates were purified PCR products, and prepared from Ml3 clones of shot gun libraries from each one of the 13 cosmids that orderly cover the whole p42d plasmid. More than 500 reactions per cosmid were made to get the raw sequence, the sequences were initially assembled using the CONSED program. To fill the gaps between the CONTIGs, around 180 single strand DNA primers were prepared on the borders of each CONTIG, these primers were utilized with a p42d BamHI clone of the specific zone to get the sequence. A set of 1000 additional reactions was required for the refinement.

The final sequence of p42d, has less than one mismatch in every 10,000 bp. This sequence was tested by performing 72 PCR reactions with primers designed to successively cover the whole pSym. As templates, total DNA from a CFN42 strain, recently isolated from a nitrogen-fixing nodule, were utilized. All the PCR products of the expected size were obtained.

In order to establish the open reading frames (ORFs) of the plasmid, GLIMMER software was employed in an iterative way, feeding the results of the first search as a seed of the second one, each time changing some of the parameters for the stringency of the analysis.

The amino acids sequence of the polypeptides, derived from the ORF sequences of the plasmid was then compared by BLASTX with those reported on the Data Bases. Specific comparisons were carried out with the genomes of Mesorhizobium loti, Sinorhizobium meliloti, Methanobacterium thermoautotropicum, the symbiotic island of Bradyrhizobium japonicum and the plasmids pNGR234a, from Rhizobium sp., pRil724 from Agrobacterium rhizogenes and Ti from Agrobacterium tumefaciens. Codon adaptation index and G+C content were estimated for each ORF. IS elements were grouped according to reported families and compared with the reiterated sequences of the plasmid. Polypeptides derived from each ORF were organized in functional classes; 43% of the total are of unknown function.

References

Brom S et al. (1988) Appl. Environ. Microbiol. 54, 1280-1283 Flores M et al. (1987) J. Bacteriol. 169, 5782-5788 Girard ML et al. (1991) J. Bacteriol. 173, 2411-2419 Pifiero D et al. (1988) Appl. Environ. Microbiol, 54, 2825-2832 Ramirez MA et al. (1997) Microbiol. 143, 2825-2831 Romero DR et al. (1991) J. Bacteriol. 173, 2435-2441 Segovia L et al. (1991) Appl. Environ. Microbiol. 57, 426-433

Acknowledgements

We are grateful to J.C. Hernández, V. Quintero, H. Salgado, R.E. Gómez, J.A. Gama, I. Hernández, J. Espíritu and E. Díaz for technical and computing assistance.

Was this article helpful?

0 0

Post a comment