Does RNA-seq-based Eukaryotic GeneFinding of Blast2GO require repeat-masking the whole genome shotgun (WGS) sequence? A case study in jute (Corchorus olitorius L., Malvaceae s. l.)
Debabrata Sarkar1, Carlos Menor2 and Nagendra Kumar Singh3
. ORCID iD: 0000-0003-3943-9646
Prior to gene prediction in eukaryotes, it is important to mask repetitive sequences including low-complexity regions and transposable elements (Yandell and Ence 2012; Ekblom and Wolf 2014). In the present study, we investigated the effectiveness of RNA-seq-based [WebAUGUSTUS (Hoff and Stanke 2013)] gene prediction using the Eukaryotic GeneFinding (EGF) module of the software Blast2GO (Conesa et al. 2005) with or without masking the whole genome shotgun (WGS) sequence of Corchorus olitorius L. (Sarkar et al. 2017). Since genome masking is not a prerequisite for EGF (in Blast2GO), our overreaching objective was to assess whether repeat-masking would improve the precision of protein-coding gene prediction and annotations using evidence from RNA-seq alignments as implemented in Blast2GO.