Large Scale Genome Analysis Of Lotus Japonicus Mg20

S. Sato, T. Kaneko, Y. Nakamura, T. Kato, E. Asamizu, S. Tabata

Kazusa DNA Research Institute, 1532-3 Yana, Kisarazu, Chiba 292-0812, Japan

1. Introduction

The progress in DNA sequencing technology has allowed us to perform systematic and comprehensive analysis of genetic information in a variety of organisms. The entire genomes of several higher eukaryotes including a flowering plant, Arahidopsis thaliana, have already been sequenced, and the complete lists of potential gene complements in its genome have been compiled by computer-assisted analysis (The Arabidopsis Genome Initiative 2000). Furthermore, information and material resources obtained during the course of genome sequencing can be utilized to study functional aspect of genes in the genome.

Lotus japonicus has been proposed as a model system for molecular genetics in legume species, especially on interaction between the legumes and rhizobia. The genome was estimated to be 471 Mb long (Ito et al. 2000), which is comparable with that of a monocot model plant, Oryza sativa. Intensive genomic studies, including generation of DNA markers and a genetic linkage map, construction of genome and cDNA libraries, and collection of expressed sequence tags, are in progress. By taking advantage of the above circumstances, we initiated a large-scale analysis of the L. japonicus genome including EST analysis, genome sequencing and mapping.

2. Materials and Methods

2.1. EST analysis. Poly(A)+ RNA was extracted from various organs of Lotus japonicus Miyakojima MG-20 and Gifu B-129 accessions, and cDNA libraries were prepared according to the standard method (Asamizu et al. 1999). ESTs from 5' and 3' ends of each cDNA insert were collected and analyzed as previously described (Asamizu et al. 2000).

2.2. Genome sequencing. Genomic DNA libraries of L. japonicus MG-20 for DNA sequencing and mapping were generated using TAC (transformation competent artificial chromosome) vector pYLTAC7 (Liu et al. 1999). For DNA sequencing, clones were selected by screening the genomic libraries by PCR using the primers designed on the basis of the EST sequences. Sequencing of the respective clones was performed by a bridging shotgun strategy. The average redundancy of the random data was 5 times. The raw sequence data produced were assembled using Phred-Phrap programs (Phil Green, Univ. Washington, Seattle). Additional sequencing was performed to maintain the secure phred score of 20 or higher for the entire region.

Assignment of the protein coding regions and gene modeling were performed as described previously (Sato et al. 2000). Briefly, similarity search against the non-redundant protein sequence database nr (compiled by NCBI) was carried out using the BLASTX program. In parallel, prediction of potential protein coding regions was performed with computer programs, Grail, GENSCAN and NetGene2. The transcribed regions were assigned by comparison of the nucleotide sequences with Lotus ESTs (Asamizu et al. 2000) in the public databases using the BLASTN program.

2.3. Linkage analysis. Simple sequence repeat (SSR) markers and dCAPS markers were generated using the sequence information of the clones. Linkage analysis was performed using the F2 population of Gifu B129 and Miyakojima MG-20 according to the standard method.

3. Results and Discussion

3.1. EST analysis. As the first step of EST analysis, 22,983 cDNA clones originated from whole plants were sequenced from their 5' ends. These ESTs were clustered into 7137 non-redundant groups, of which an overall GC content is approximately 49%. As cDNA clones were not always full-length, a possibility remains that different regions of a single gene are included as non-overlapping ESTs. To exclude this possibility, 3'-ESTs were analyzed. As of June 2001, a total of 38,685 3'-EST have been accumulated from the libraries of pods, roots nodules, nodule primordia of Gifu and Miyakojima accessions. These represented 16,896 non-redundant species. The latest status of the EST analysis is summarized in Table 1.

Table 1. The status of EST analysis

Organ

Accession

Library type

5' ESTs

3' ESTs

Whole plant

MG2

Normalized

1828

392

Size-selected

470

146

Pod

MG2

Normalized

6532

Size-selected

4811

Root

MG2

Normalized

7670

Size-selected

2340

Nodule primordia

B12

Normalized

7307

Size-selected

1158

Nodule

B12

Normalized

3478

Total

22983 *

3868

Non-redundant

7137

1689

* Data have been released

* Data have been released

Was this article helpful?

0 0

Post a comment