Chairs Comments The Whole Chromosome Sequence Of Sinorhizobium Meliloti Strain 1021

D. Capela1'2, F. Barloy-Hubler2, J. Gouzy1, G. Bothe3, F. Ampe1, J. Batut1, P. Boistard1, A. Becker4, M. Boutry5, E. Cadieu2, S. Dréano2, S. Gloux2, T. Godrie6, A. Goffeau5, D. Kahn1, E. Kiss1, V. Lelaure2, D. Masuy5, T. Pohl3, D. Portetelle6, A. Pühler4, B. Purnelle5, U. Ramsperger3, C. Renard1, P. Thébault1, M. Vandenbol6, S. Weidner4, F. Galibert2

laboratoire de Biologie Moléculaire des Relations Plantes-Microorganismes, UMR215-

CNRS-INRA, Chemin de Borde Rouge, BP27, F-31326 Castanet Tolosan Cedex, France laboratoire de Génétique et Développement UMR6061-CNRS, Faculté de Médecine,

2 avenue du Pr. Léon Bernard, F-35043 Rennes Cedex, France 3GATC GmbH, Fritz-Arnold-str. 23, D-78467 Konstanz, Germany 4Universität Bielefeld, Biologie IV (Genetik), Universitätstr. 25, D-33615 Bielefeld, Germany

5Unité de Biochimie physiologique, Université Catholique de Louvain, Place Croix du Sud 2,

Bte 20, B-1348 Louvain-la-Neuve, Belgium 6Unité de Microbiologie, Faculté des Sciences Agronomiques de Gembloux, Avenue Maréchal Juin 6, B-5030 Gembloux, Belgium

1. Introduction

Sinorhizobium meliloti strain 1021 is a free-living, gram-negative soil bacterium, and symbiont of alfalfa (Medicago sativa). Its genome consists of three large replicons - a chromosome and two megaplasmids - that have been entirely sequenced by an international consortium (Galibert 2001). The pair M. truncatula and S. meliloti has been chosen by many international groups and emerges as the model for symbiosis and nitrogen fixation studies worldwide (along with Lotus japonicus and Mesorhizobium loti).

2. Results and Discussion

The entire double-strand nucleotide sequence of the S. meliloti strain 1021 chromosome has been determined by shotgun sequencing of 50 ordered recombinant BAC clones with a sufficient redundancy (over four-fold) and a sequence quality estimated to correspond to less than one error per 100,000 nucleotides. The average GC content is 62.7% (as in the M. loti chromosome), although six large regions with lower GC content have been found - three of these correspond to the rrn opérons and the others to regions of putatively external origin. For instance, 0.5% of the 2.2% of the total chromosome sequence with transposon-related functions are located in one of these regions. In addition, except for the three rrn opérons, only tuf and purl! are found duplicated on this replicon, with more than 90% nucleotide sequence identity.

Coding regions, including protein-coding genes and RNA-coding genes, represent 86.4% of the total chromosome sequence and their organization includes frequent changes in polarity. A total of 3341 protein-encoding genes are predicted, with a mean length of 938 bp. The longest identified chromosomal gene, ndvB, is 8496 nucleotide in size. Putative functions were assigned on the basis of homology to 59% of the chromosomal genes and 5% of the protein-coding ORFs are orphans, with no sequence analogy.

The S. meliloti chromosome carries all 57 genes involved in DNA metabolism and replication, while several genes involved in primosome assembly in E. coli are missing. Six of the nine genes required for cytokinesis in E. coli (ftsA, fis I, ftsK, ftsQ, fis W and two ftsZ genes) are found on the S. meliloti chromosome, while ftsL,ftsN and zipA are apparently missing. We also identified new genes for septum formation (maf), chromosome partitioning (smc and parAB) as well as ctrA, a member of the two-component signal transduction family involved in the control of a number of cell cycle-regulated genes.

Fifty-one tRNA genes have also been detected. These are evenly distributed on the chromosome and correspond to 43 different tRNA acceptors, which by wobble pairing can translate all but one codon. The missing essential tRNA, corresponding to an arginine codon, is encoded by the pSymB megaplasmid.

With the notable exception of asparagine synthase, whose two asn genes are on pSymB, the complete pathways for amino acid biosynthesis have all been found. Interestingly, S. meliloti possesses two different pathways for methionine biosynthesis, the classical metABC, as in E. cotí, and metZ, essential for symbiosis in R. etli. All the essential genes responsible for the de novo synthesis of purines and pyrimidines as well as glycolysis and gluconeogenesis are encoded by chromosomal genes, with the exception of the gluconeogenic fructose-1,6-biphosphatase (cbbF), found on pSymB. Regarding glycolysis, S. meliloti lacks the classical ATP-dependent phosphofructokinase (pjk) but possesses a complete Entner-Doudoroff pathway, which makes this the main route for glucose utilization. A total of 10.8% of the chromosomal proteins are involved in transport, 33% of which belong to the ABC (ATP Binding Cassette) and 18% to MF (Major Facilitator) superfamilies.

S. meliloti seems well equipped to face a large variety of stress conditions, including osmotic shock, heat and cold shock as well as oxidative stresses. Oxygen protection might be essential for efficient infection if rhizobia, like many pathogens, induce an oxidative burst upon plant cell infection. The eleven gst (Glutathione-S transferase) and the three rpoE (sigma factor 24) identified on the chromosome might contribute to protection against oxygen or other reactive molecular species. Six additional gst and five additional rpoE have been recognized on the megaplasmids. In addition to cell surface components, the chromosome sequence reveals a number of genes putatively involved in virulence, including an ortholog of the acvB virulence gene of Agrobacterium tumefaciens. Finally, regulatory functions have been assigned to 7.2% of the chromosomal genes, incuding an unusually high number of nucleotide cyclases (26 genes). The role of these cyclases and the signal transduction pathways in which they participate are unknown in S. meliloti.

3. Conclusions

The S. meliloti chromosome sequence carries not only housekeeping genes, but also genetic information for mobility and chemotaxis processes, plant interaction, putative virulence as well as stress responses. However, since up to 41% of the S. meliloti chromosomal genes still have unassigned functions many other functions are likely to be encoded by this replicón. Transcriptomic and proteomic analyzes have been initiated to elucidate these.

4. References

Galibert F et al. (2001) Science 293, 668-672

THE 1683 KB REPLICON OF SINORHIZOBIUM MELILOTI: PAST AND PRESENT INVESTIGATIONS INTO THE NATURE OF A VERY LARGE BACTERIAL PLASMID

S.R. MacLellan, C.D. Sibley, B. Golding, T.M. Finan

Department of Biology, McMaster University, 1280 Main St. West, Hamilton, ON, L8S 4K1 Canada

1. Introduction

Many of the bacteria that employ endosymbiotic nitrogen-fixing lifestyles in association with plant hosts possess large non-chromosomal DNA replicons. Sinorhizobium meliloti, for example, possesses a 1400 kb and a 1700 kb unit- or low-copy number plasmid in addition to a 3500 kb chromosome. The plasmids, or megaplasmids, in S. meliloti have garnered interest since their detection because numerous genes that influence ^-fixation and endosymbiosis map to these sequences. Their large size (comparable to some of the smaller bacterial chromosomes) have generated speculation as to their genetic content and the biological role they play in the free-living or bacteroid forms of S. meliloti cells. Many laboratories have contributed to our increased understanding of the biology of the megaplasmids. Here, we consider some of the past and present research initiatives in which our laboratories have played some role in investigating the nature of the largest S. meliloti megaplasmid, pSymB.

We begin with the first detection of pSymB (originally called pRmeSU47b and now also called pExo), its genetic mapping, and the results of a large-scale deletion analysis of the megaplasmid. pSymB is a member of a growing and well-conserved family of plasmids (based on their replicator regions) called repABC-type replicons. We discuss research initiatives that have and will enhance our knowledge of the replication and segregation mechanisms that ensure faithful maintenance of these replicons in cell populations. Finally, we consider some of the general features of the nucleotide sequence of pSymB that have been deduced as the result of the completion of the S. meliloti genome sequencing effort.

2. The Discovery of pSymB in S. meliloti

The megaplasmid pSymB was originally detected (Finan et al. 1986) during attempts to map the location of transposon insertions generating a class of mutants that induced the formation of atypical nodules that were Fix" and devoid of infection threads. These mutants were unable to synthesize an exopolysaccharide now known to be required for effective nodulation. Eckhardt gels used to visualize high molecular weight genomic DNA would often display a doublet character at the band that corresponded to the previously identified megaplasmid, the so-called nod-nif plasmid. Southern hybridization analyses using radiolabeled Tn5 probe showed hybridization to the upper band of the doublet in the exo mutant strains while labeled probes hybridized with the lower band in strains containing a nifH::Tn5 insert. This and other data demonstrated the existence of a second megaplasmid similarly sized though somewhat larger than the nod-nif (pSymA) megaplasmid in S. meliloti. Today both plasmids are often referred to as symbiotic plasmids since both contain exopolysaccharide synthesis loci (and other genes) that are required for effective nodulation.

3. Genetic Mapping and Deletion Analysis of pSymB

To facilitate the characterization of pSymB, a genetic map of the replicón was constructed by employing transposon insertions with alternating antibiotic resistance genes as genetic markers (Charles, Finan 1990). These markers, linked by transduction, for the first time allowed the unambiguous mapping of known phenotypic loci on the plasmid. This development led to the first genomic scale analysis of any replicon in S. meliloti because it was realized that these same transposon markers could be used, via homologous recombination between adjacent or distant IS50 elements, to generate large-scale deletions in pSymB. Accordingly, a series of mutant derivatives containing 120-600 kb deletions were constructed (Charles, Finan 1991). The ensuing phenotypic analysis revealed previously unknown loci required for the utilization of dulcitol, melibiose, raffinose, p-hydroxybutyrate, acetoacetate, protocatechuate and quinate as well as previously unidentified loci required for effective nodulation and exopolysaccharide synthesis. Collectively, nearly 90% of pSymB was demonstrated to be non-essential for viability. Some regions of the plasmid were not represented amongst the deletion derivatives raising the possibility that these sequences contained loci that were essential for viability (see later).

Indirectly, the construction of the linkage map for pSymB played a significant role in the development of a technique to clone out very large regions of plasmids that could then be used as a source of DNA for genome sequencing and also led to the discovery of the pSymB origin of replication (Chain et al. 2000).

4. Controlling the Replication and Segregation of a Megaplasmid

Early attempts to isolate active origins of replication from the S. meliloti genome were largely unsuccessful. Our initial involvement in the pSymB nucleotide sequencing project concentrated on a 60 kb region of DNA that was not represented amongst the deletion derivative strains described by Charles and Finan (1991). This region appeared to replicate autonomously in A. tumefaciens. Within this region, we identified three ORFs (Chain et al. 2000) with a high degree of sequence similarity to the rep A, repB, and repC genes that had previously been isolated and genetically characterized in plasmids from Rhizobium leguminosarum (Turner, Young 1995), Rhizobium etli (Ramirez-Romero et al. 2000), several strains of Agrobacterium tumefaciens (Tabata et al. 1989; Suzuki et al. 1998; Li, Farrand 2000), A. rhizogenes (Nishiguchi et al. 1987) and from the soil bacterium Paracoccus versutus (Bartosik et al. 1998). It is therefore clear that pSymB is a member of a large and growing family of plasmids called repABC-type replicons - so called because their replication and segregation processes are dependent upon the repABC gene products. A 780-bp sequence previously isolated from pSymB that appears to be capable of autonomous replication (Margolin, Long 1993) is located approximately 650 kb from the repABC locus. The biological significance of this apparently non-essential sequence is not known, however recent experiments in our laboratory suggest that its replication is not dependent on repABC. The S. meliloti genome sequencing project has revealed the presence of a second set of repAB genes on pSymB (131 kb distant from repABC) that are more closely related to genes in other rhizobia rather than those in

5. meliloti (Finan et al. 2001). The sequencing project has also revealed that the 1400 kb pSymA replicon in S. meliloti is a repABC-type replicon.

Since the first description of a repABC replicator region (Nishiguchi et al. 1987) several genetic analyses have demonstrated the following general features of the region. Deletions in the repA or repB genes do not prevent replication but render the replicon (either the native plasmid or a mini-derivative of the plasmid) unstable. Such a plasmid is rapidly lost from a growing culture under non-selective conditions in contrast to the non-mutated replicon that is reportedly quite stable under the same conditions (Ramirez-Romero et al. 2000). Thus the RepA and RepB proteins appear to play a role in plasmid segregation during cell division. The predicted amino acid sequences of these proteins demonstrate their relatedness to a rather large family of proteins from many different species of bacteria. Where studied, these proteins have been demonstrated to positively influence plasmid segregation in the cell. This sequence similarity extends to the SopAB and ParAB proteins from the Escherichia coli F plasmid and PI phage, respectively. Several apparent motifs, including a putative ATPase domain in RepA, are strongly conserved amongst the RepAB proteins and the polypeptides from the aforementioned replicons.

Deletions, insertions, or frameshift mutations in repC eliminate replication (Tabata el at 1989; Bartosik et at 1998; Ramirez-Romero et at 2000). RepC is often presumed to initiate replication by initiating the melting of duplex DNA at or adjacent to the origin of replication much in the way that RepE performs the same function at the F plasmid origin. Unlike the case for RepA and RepB, sequence database searches do not detect proteins with significant similarity to RepC, except for those coded by genes that are resident in other repABC-type replicons.

A non-coding region of nucleotide sequence between the repB and repC ORFs exerts an incompatibility effect against a replicon possessing the parental repABC replicator region. This effect has been consistently recognized in every replicator region examined although a mechanistic explanation for the observation has not been forthcoming. Interestingly, Ramirez-Romero et at (2000) reported that a deletion in the repA gene of the Rhizobium etli p42d plasmid relieves this incompatibility effect - an observation that our lab has also made with a pSymB replicator region derivative possessing a frameshift mutation in rep A. Finally, the repB-C intergenic regions from rhizobial plasmids display a rather high level of nucleotide sequence conservation that is not maintained throughout the entire repABC region. This may ultimately reflect a conservation of cis-acting biochemical function amongst these related replicator regions.

Discussed above are the most obvious features of a typical repABC-type replicator region. Previous genetic analyzes however, have detected other features that are too numerous to mention here. One striking aspect of these regions is their overt similarity in terms.of overall organization and sequence similarity. That said, we also note some differences amongst the behavior of these replicator regions. For example, other investigators have indicated the presence of a second incompatibility region that is essential for replication downstream of the repC gene (Ramirez-Romero et at 2000). In the case of pSymB, we can detect no obvious incompatibility effect exerted from this region. Furthermore, Bartosik et at (1998) have indicated that the repC gene itself (including its ribosome binding site, but without additional upstream and downstream nucleotide sequence) can confer autonomous replication upon an otherwise non-replicating vector. Our experiments using pSymB derivatives have indicated that replication is dependent on having nucleotide sequences both upstream and downstream of the repC gene. These differences are significant in so much as they place limitations upon the location of the actual origin of replication and possibly other cis-acting sites within the region. Further research should determine whether such differences are largely artifactual due to slightly differing experimental systems or whether they represent some level of functional divergence within an otherwise well-conserved biochemical and genetic framework.

The study of repABC replicator regions is essentially in its infancy and if other plasmid systems are any indication, such regions are liable to be very complex from a mechanistic point of view. To date, even in the best studied plasmid replicator regions, processes such as incompatibility and segregation are poorly understood. With respect to repABC systems, there are several avenues of investigation that should improve our understanding of the region. Among these, where is the actual origin of replication located and what is the precise role of RepC? How is the expression of the proteins regulated and what other proteins play a role in the initiation of replication and plasmid segregation? Are there any cross-interactions between the cis and trans acting components of different repABC systems in the same cell? Finally, will these replicator regions yield insight into the evolutionary history of the very large plasmids, like pSymB, that are typical of the Rhizobeaceae? Some plasmids contain more than one replicator region and it is possible that these represent vestigial replication and segregation control regions for what were once smaller, independent plasmids.

5. The Complete Nucleotide Sequence of pSymB

5. meliloti is perhaps the best characterized of the N2-fixing endosymbiotic bacteria and given its agricultural and ecological importance and its rather unusual physiology amongst the bacteria, it was a natural choice as a target for complete genome sequencing. It is perhaps fitting that the publication of these Proceedings of the 13l International Congress on Nitrogen Fixation nearly exactly coincides with the publication of the complete genome sequence of Sinorhizobium meliloti (Galibert etal. 2001).

With regard to the pSymB replicon, its determined length is 1,683,333 bp (Finan et al. 2001) that is in good agreement with previous physical and genetic estimations. It has an overall GC content of 62.4% that is rather similar to that of the S. meliloti chromosome. It has a gene density similar to that of other bacterial chromosomes (about 90% protein coding) and a predicted 1,570 ORFs.

Eleven gene clusters (totaling 223 kb) that encode for cell surface (lipo-, capsular-, and extracellular-) polysaccharide synthetic machinery are found on pSymB and nine of these were previously unknown. Seventeen percent of the coding capacity of pSymB is devoted to ABC transporter systems with half of these predicted to be sugar-specific. pSymB encodes just over half (235) of all of the ABC transporter systems predicted in the entire genome. There is a predicted 134 ORFs encoding transcriptional regulators on pSymB including four ECF (extracytoplasmic function) sigma factors. In addition, a number of genes predicted to be involved in amino acid catabolism, nucleotide scavenging, aromatic compound degradation, and plant-derived metabolite degradation/utilization were predicted.

A number of potentially essential genes were discovered on pSymB. These include the only copies of the minCDE genes and the only copy of an Arg-tRNA gene that encodes the cognate RNA for the second most frequently used arginine codon, CCG. Both of these loci lie within regions that are not represented in the previously discussed pSymB deletion mutant library. Another region not represented in the library contains only two loci that might obviously be essential. One is fusA2, or elongation factor G, but experiments in our laboratory suggest that this gene is not essential (P. Aneja, unpublished data). The other locus is, as previously discussed, the replicator region repABC gene cluster. We suspect that these genes, or the c«-acting origin of replication within the region, are essential in so much as they are required for pSymB replication and pSymB is essential for cellular viability.

6. Conclusions

Since its initial detection, research involving the largest megaplasmid in S. meliloti has advanced considerably. We now know the sequence of the entire pSymB plasmid and the unambiguous location of all known and predicted genes. Oddly, what might seem the ultimate compendium of biological information, the genome sequence really represents a new starting point. It is the current point from which research moves ahead into the so-called post-genomic era, but is also a point from which techniques that pre-date the genomic era must be brought to bear. Understanding the mechanisms of replication and segregation of the megaplasmids in S. meliloti, resolving the vast transcriptional regulatory networks within the free-living and endosymbiotic cell, and determining why pSymB possesses so many varied solute uptake systems are but three of untold numbers of issues that wait to be addressed.

7. References

Bartosik D et al. (1998) Microbiol. 144, 3149-3157 Chain PSG et al. (2000) J. Bacteriol. 182, 5486-5494 Charles TC, Finan TM (1990) J. Bacteriol. 172, 2469-2476 Charles TC, Finan TM (1991) Genet. 127, 5-20

Finan TM et al. (2001) Proc. Natl. Acad. Sci. USA 98, 9889-9894

Galibert F et al. (2001) Science 293, 668-672

Li P, Farrand SK (2000) J. Bacteriol. 182, 179-188

Margolin W, Long SR (1993) J. Bacteriol. 175, 6553-6561

Nishiguchi R et al. (1987) Mol. Gen. Genet. 206, 1-8

Ramirez-Romero MA et al. (2000) J. Bacteriol. 182, 3117-3124

Suzuki K et al. (1998) Biochim. Biophys. Acta 1396,1-7

Tabata S et al. (1989) J. Bacteriol. 171, 1665-1672

Turner SL, Young JPW (1995) FEMS Microbiol. Lett. 133, 53-58

SEQUENCING AND ANNOTATION OF THE MEGAPLASMID pSYMB OF THE NITROGEN-FIXING ENDOSYMBIONT SINORHIZOBIUM MELILOTI

S. Weidner, J. Buhrmester, L. Sharypova, F.-J. Vorhölter, A. Becker, A. Pühler

Universität Bielefeld, Fakultät fur Biologie, Lehrstuhl für Genetik, Universitätsstr. 25, D-33615 Bielefeld, Germany

1. Introduction

All bacteria known as rhizobia induce N2-fixing root nodules on leguminous plants. In particular, Sinorhizobium meliloti nodulates species of the genera Medicago, Melilotus and Trigonella. The interaction of S. meliloti and Medicago truncatula has been studied for a long time by numerous international groups because this symbiosis represents a model system in the field of plant-microbe interaction.

Like many other bacteria belonging to alpha-proteobacteria, S. meliloti possesses a complex genome composed of a chromosome and two so-called megaplasmids, namely pSymA and pSymB. An international consortium sequenced the complete genome of S. meliloti strain 1021 (Galibert et al. 2001; Capela et al. 2001; Barnett et al. 2001; Finan et al. 2001). The sequence of the megaplasmid pSymB was established as a collaborative project of the Bielefeld group and in the group of T.M. Finan from the McMaster University (Hamilton, Ontario, Canada). In this article, the contribution of the Bielefeld group is presented.

2. Procedure

The sequencing of the pSymB megaplasmid was based on a BAC clone contig consisting of 24 BAC clones making use of the vector pBeloBACll (Barloy-Hubler et al. 2000). Later during the sequencing phase of the project, two additional BAC clones covering minor contig gaps were added to the original contig. By restriction analyses and Southern hybridization experiments to genomic DNA of S. meliloti, the nativeness of all BAC clone inserts was verified. To determine a suitable sequencing strategy, the sizes of overlaps between individual BAC clone inserts were estimated by restriction analyses and Southern hybridization. Shotgun clone libraries with insert sizes of 1.5 kb and 3.0 kb were generated for all BAC clones. Every second BAC clone of the minimal set was sequenced applying the shotgun sequencing strategy using Ml3 forward and reverse primers. By sequencing both ends of the remaining BAC clone inserts, the exact size of overlaps was determined. In the case of small overlaps to neighboring BAC clones, the shotgun sequencing strategy was employed again. For BAC clones with larger overlaps, shotgun clones were sequenced from one end first and, in case of a localization in a non-overlapping region, also from the other end.

Sequencing was continued until a 7.5-fold coverage was obtained. Closing gaps and polishing the sequence was carried out by primer walking on BAC insert DNA using custom primers. Sequence assembly was performed using the phred/phrap and Staden (gap4) packages (Ewing, Green 1998; Ewing et al. 1998; Staden 1996). Custom primers were designed by PRIDE (Haas et al. 1998). The annotation of the pSymB sequence was carried out in collaboration by the teams of McMaster University and the University of Bielefeld. In Bielefeld, the GEN-db annotation database environment, a recently developed program package, was used.

3. Results

As already mentioned, the complete megaplasmid pSymB could be covered by a minimal set of 26 BAC clones. The arrangement of the BAC clone inserts along the pSymB megaplasmid is illustrated in Figure 1. The total length of the S. meliloti pSymB megaplasmid turned out to be 1,683,333 bp. The G+C content was found to be 62.4%, which is almost identical to the chromosome 62.7% (Capela et al. 2001). In total, 1570 open reading frames have been predicted which means that 90% of the pSymB sequence can be considered as protein-coding.

4. Unexpected Genes Located on pSymB

Several genes annotated on pSymB were not expected to be situated on this replicon, e.g. minCDE and ftsK2 (Figure 1). For E. coli, it was shown that the proteins MinCDE play a role in cell division. Surprisingly, the only copy of the S. meliloti minCDE genes is located on the pSymB megaplasmid. Another protein involved in the E. coli septa formation is FtsK. In S. meliloti, one of the two copies offtsK is encoded on pSymB while the other copy is located on the chromosome. Another example of unexpected genes located on megaplasmid pSymB is the only copy of an arginine tRNA gene. The gene product provides the second most frequently used codon CCG (Figure 1).

Beside these individual genes, two more classes of genes dominate on the pSymB megaplasmid: (i) genes proposed to be involved in the biosynthesis of polysaccharides; and (ii) genes coding for transport systems of the ABC-type.

5. Polysaccharide-biosynthesis Gene Clusters Located on pSymB

Surface polysaccharides of S. meliloti are essential for successful nodule invasion. From previous studies, it was known that the exo/exs cluster (Becker et al. 1993; Glucksmann et al. 1993), directing the biosynthesis of succinoglycan (EPS I), and the exp cluster (Gazebrook, Walter 1989; Becker et al. 1997) involved in the biosynthesis of galactoglucan (EPS II), are located on the pSymB replicon. Many more genes whose products are proposed to be involved in the biosynthesis of polysaccharides have been identified during sequence annotation. Together with the exo/exs and the exp cluster, 11 gene clusters containing 188 predicted genes with a total size of 223 kb have been identified. They comprise over 12% of the genes located on pSymB (Figure 1).

Similarly to the exo/exs and exp cluster, the cryptic polysaccharide clusters encode diverse biochemical functions required for the production of polysaccharides, including the biosynthesis of sugar precursors (BSP), glycosyltransferase activities (GT) and the export/polymerization (Exp/Pol) machinery. Thus, the genes identified in the two largest cryptic clusters can be grouped into the following functional classes: (i) 7 BSP, 6 GT and 3 Exp/Pol; (ii) 10 BSP, 7 GT and 3 Exp/Pol. The gene products of the other clusters seem to cover only a few steps of a hypothetical polysaccharide biosynthetic pathway. Even the most complete gene clusters do not reveal such a compact and comprehensive organization as the exo/exs and exp clusters. Genes of the cryptic clusters do not form large transcriptional units and are often separated by genes which are unlikely to be involved in polysaccharide biosynthesis. Probably, the scattered and interrupted organization of these gene clusters resulted from genomic rearrangements and horizontal gene transfer. Consequently, it is possible that some cryptic clusters are silent.

Since S. meliloti 1021 was never reported to produce any extracellular polysaccharide in addition to EPS I and EPS II, the cryptic clusters are most probably involved in the synthesis of cell surface antigens, like K antigens (capsular polysaccharide, KPS) and O-antigens of lipopolysaccharides (LPS). In contrast to the enteric bacteria, Rhizobium leguminosarum and R. etli, the antigenic specificity of S. meliloti strains is determined by K antigens rather than O-antigens of LPS (Reuhs et al. 1998). This means that nearly every S. meliloti strain produces a different K antigen. Chemical structures of K antigen and LPS produced by Rml021 are unknown. Therefore, pathways leading to the biosynthesis of these polymers cannot be predicted and linked to the functional annotation of the polysaccharide gene clusters. To elucidate functions of the cryptic clusters, knock-out mutants were generated by a plasmid integration approach. The phenotypes of the resulting mutants are under investigation.

nodP2Q2

traA2 manBCA

hutU-l phoT-C

1j ^expE8-expA 10

kpsF2-rkpZ1 mocDEF

tfxG asnO-

ugpB-C \ hyuAB

thuR-B

cbbX-R

S. meliloti 1021 pSymb (1,68 Mb)

groEL5

nfeD -bacA

Arg-tRNA

12 1C

J.. , \ u>*\ dctABD repC1B1A1 lacE-K1 thiC-E (oriV)

Figure 1. Genomic map of the pSymB megaplasmid of Sinorhizobium meliloti strain 1021. The positions of some specific genes are indicated. The circle displays genes and gene clusters involved in polysaccharide biosynthesis (dark gray) and ABC transporter systems (light gray). The 26 BAC clone inserts forming the BAC clone contig used for sequencing are displayed inside the circle.

6. ABC Transporter Gene Clusters Located on pSymB

Transport of specific molecules across the cytoplasmic membrane is mediated by proteins associated with the membrane and belonging to different families. The largest and most diverse family of these transport proteins is the ABC superfamily. ABC transporters consist of one or two integral membrane proteins (permeases), an ATP-binding protein (ATPase), and, in case of uptake systems, a periplasmic solute-binding protein. In the whole genome of S. meliloti, 430 genes coding for ABC

transporter systems have been predicted; 235 of them coding for 65 ABC transport systems are located on the pSymB megaplasmid (Figure 1). Most of them are import systems, which are proposed to transport sugars (58%), amino acids and peptides (11%), iron (8%), spermidine/putrescine (4%), and other solutes (19%). In accordance with these solute-import systems, numerous genes proposed to encode catabolic activities are located on the pSymB megaplasmid.

In the symbiotic interaction, there is a great demand for iron, which is used, e.g. for the nitrogenase complex, ferrodoxin and other iron proteins. In the genome of S. meliloti, eight iron ABC transporters have been annotated; three are encoded by the chromosome, two by megaplasmid pSymA and four by the megaplasmid pSymB. The four iron ABC transporters encoded by pSymB were analyzed in detail. All four ATPases contain the two conserved motifs, WalkerA and WalkerB, which are involved in the binding of ATP. A characteristic signature sequence, the function of which is unknown, could be identified for all four ATPases. Phylogenetic analyses demonstrated that all ATPases clustered into this group of importers. One of the ABC systems belongs to the siderophore/heme/vitamin Bi2 type. Two systems cluster into the group of ABC transporters of the ferric-iron type. The affiliation of the fourth iron ABC transport system is unclear. To clarify their specific role and regulation, knock-out mutants in several genes which encode the iron ABC transporters and putative iron regulators on the three replicons of S. meliloti, are under investigation.

7. Conclusions

There are several reasons that justify the view that the megaplasmid pSymB is a second chromosome in S. meliloti. First, the pSymB is comparable in size. Second, the gene regions of pSymB coding for the arginine-tRNA and MinCDE are essential for the growth of the S. meliloti. Third, the G+C content of pSymB and of the chromosome of S. meliloti are almost identical.

The large number of solute transport systems and numerous genes that are proposed to encode catabolic activities lead to the assumption that pSymB provides S. meliloti with the ability to utilize many different compounds from the soil and rhizosphere environment and enhances its metabolic flexibility. Similarly to the metabolic adaptation to different habitats, the large amount of polysaccharide gene clusters located on megaplasmid pSymB may extend the surface variability of S. meliloti and thereby enable the bacteria to cope with the different conditions and environments it encounters in the soil, rhizosphere and the legume nodule.

8. References

Barloy-Hubler F et al. (2000) Curr. Microbiol. 41, 109-113

Barnett MJ et al. (2001) Proc. Natl. Acad. Sci. USA 98, 9883-9888

Becker A et al. (1993) Mol. Gen. Genet. 241, 367-379

Becker A et al. (1997) J. Bacterid. 179, 1375-1384

Capela D et al. (2001) Proc. Natl. Acad. Sci. USA 98, 9877-9882

Finan TM et al. (2001) Proc. Natl. Acad. Sci USA 98, 9889-9894

Galibert F et al. (2001) Science 293, 668-672

Glazebrook J, Walker GC (1989) Cell 56, 661-672

Glucksmann MA et al. (1993) J. Bacteriol. 175, 7045-7055

Haas S et al. (1998) Nucleic Acids Res. 26, 3006-3012

Reuhs BL et al. (1998) Appl. Environ. Microbiol. 64, 4930-4938

Staden R (1996) Mol. Biotechnol. 5, 233-241

9. Acknowledgements

We thank F. Barloy-Hubler, D. Capela and F. Galibert for providing the pSymB BAC clones. This work was supported by a grant from the Bundesministerium fur Forschung und Technologie (0311752)to A.P.

PROTEOME ANALYSIS OF SINORHIZOBIUM MELILOTI

M.A. Djordjevic, S.H. Natera, H.-C. Chen, G. Weiller, C. Menzel, G. VanNoorden, J. Weinman, S. Taylor, K. Guo, B.G. Rolfe

Genomic Interactions Group, Research School of Biological Sciences, Australian National University, GPO Box 475, Canberra, ACT Australia, 2601

1. Introduction

With the completion of the Sinorhizobium meliloti strain 1021 (Galibert el al. 2001) and Mesorhizobium loti genomes (Kaneko et al. 2000) and with the knowledge of the NGR234 pSym (Freiberg et al. 1997) and Bradyrhizobium symbiosis regions (Gottfert et al. 2001), we have now entered the post-genome era for the analysis of these microsymbionts. The assembly of the strain 1021 genome has enabled the Sinorhizobium research community to be in the most advantageous position yet to understand the complexities of this model organism. Until recently, there was a frequent implication that knowledge of genome sequences alone would be sufficient to understand biological systems. It is now well recognized that the genome provides a huge resource that will assist understanding gene function. However, in isolation, the genome sequence is unable to predict the following:

(i) if and when mRNA species are translated;

(ii) the relative concentrations of the proteins in vivo;

(iii) the extent and types of post-translational modifications of proteins;

(iv) the cellular or sub-cellular locations of proteins;

(v) the unexpected pleiotropic effects of mutation or overexpression upon protein levels;

(vi) the occurrence of small ORFs that are often overlooked by sequence annotation programs; and

(vii) whether start sites for ORFs have been assigned correctly in all cases.

Nevertheless, armed with the genome sequence, there will be an increasing emphasis to use this extensive resource to undertake a functional genomic analysis of S. meliloti. There remain gaps in our knowledge, for example, in how the microsymbiont (a) escapes the full attention of the host defence system (Djordjevic et al. 1987), (b) establishes the symbiosis with the legume host and (c) survives in nutrient depleted environments. High throughput methodologies that establish patterns of transcription (transcriptomics), translation (proteomics) and metabolic profiles (metabolomics) will be combined with systematic mutagenesis initiatives in the near future to further define new genes of interest and address points (i) through (vii). A combination of these approaches will provide a foundation for the identification or elucidation of function using traditional and newly evolving biochemical strategies.

In this paper we will give an overview of methodological considerations used in proteomic studies and how proteomics has already contributed to generating new knowledge. Here, the proteome is defined as the total protein output encoded by a genome and includes proteins that arise from a single gene due to post-translational modification or cleavage. In the case of eukaryotes, the proteome would also include the protein products that result from differential splicing.

2. Procedure

Two dimensional gel electrophoresis (2-DGE), is the most powerful technique available to separate complex mixtures of proteins and it remains the method of choice that underpins proteome analysis. 2-DGE allows the separation of extremely complex mixtures of proteins and has a minimum of 20-

50 fold more resolving ability than reversed phase HPLC. Although approximately 3000 S. meliloti proteins can be resolved and visualized that have pis in the range of pH 4-10 (Guerreiro et al. 1999; unpublished results), this technique is not without its drawbacks. Proteins of high hydrophobicity (integral membrane proteins of greater than seven transmembrane domains), and/or of extreme pi (greater than ten or less than 3.5), and/or of high molecular mass (greater than 100 kDa), and/or of low relative abundance (transcriptional regulators comprising a few molecules per cell), remain refractive to this technique. Alternative approaches are needed to address these problematic proteins. Figure 1 summarizes the window of proteins that can be isolated using proteome analysis as predicted from the S. meliloti chromosome.

0 0

Post a comment