Roy, Soumyadeep; Wallat, Jonas; Sundaram, Sowmya S.; Nejdl, Wolfgang; Ganguly, Niloy
(Amsterdam ; Berlin ; Washington, DC : IOS Press, 2023)
Large-scale language models such as DNABert and LOGO aim to learn optimal gene representations and are trained on the entire Human Reference Genome. However, standard tokenization schemes involve a simple sliding window ...