Supported by "Genomics for Agricultural Innovation" Project, Ministry of Agriculture, Forestry and Fisheries of Japan
We aim to annotate the wheat chromosome 6B sequence, which is determined by next-generation sequencing technologies and is assembled on the BAC-by-BAC basis. For this purpose, we develop a pipeline for automated annotation. To expedite wheat genomic researches, a web-based database to provide the annotation data is constructed.
Given a long stretch of genomic sequence(s), structural and functional annotations are a crucial first step for further experimental researches. The wheat chromosome 6Bis ~1 Gbp in length and has a heavily repetitive nature. Therefore, automated annotation of this genome sequence is a quite difficult task even though we can conduct a large-scale computing. First of all, the main goal of this study is development of an annotation pipeline, which can be applied for wheat chromosome 6B sequence(s). We also provide the annotation data through a web-based database, so that researchers can freely utilize the genome information for molecular genomics, breeding, etc.
Since we have experienced genome-wide annotation of the rice genome, we decided to modify our rice annotation pipeline for wheat. The pipeline is composed of three steps: repeat detection, protein-coding gene prediction and RNA gene prediction. The first step detects repetitive elements on genomic sequences by appropriate repeat libraries and CENSOR. All the repeats are masked before the second step because the wheat genome is highly repetitive (>80%) and the repeats hamper detection of genic regions. Next, in the second step, gene prediction is performed on the basis of transcript mapping to the genome. Here we employed two types of full-length cDNA mapping method. One is within-species mapping, which compares wheat cDNAs with the wheat genome and identifies transcribed regions. The other is cross-species mapping, which aligns cDNAs obtained from other cereals to the wheat genome with relaxed criteria so that homologous regions can be found. For protein-coding gene finding, we also used an ab initio gene prediction program and confirmed expression of the predicted genes by EST mapping to the genome. In the third step, we predict non-coding RNA gene candidates such as miRNA, tRNA and rRNA. For the miRNA prediction, a homology-based procedure and evaluation of secondary structures are combined. All the annotation data are visualized on a genome browser.