Thesis Defence: Dylan Fossl (Master of Science in Computer Science)

Date
to
Location
Senate Chambers and/or Zoom
Campus
Prince George
Online

You are encouraged to attend the defence. The details of the defence and attendance information is included below:  

Date:  November 19, 2024
Time: 11:30 AM to 1:30 PM (PT)

Defence mode: Hybrid 
In-Person Attendance: Senate Chambers, UNBC Prince George Campus  
Virtual Attendance: via Zoom 

LINK TO JOIN: Please contact the Office of Graduate Administration regarding remote attendance for online defences.

To ensure the defence proceeds with no interruptions, please mute your audio and video on entry and do not inadvertently share your screen. The meeting will be locked to entry 5 minutes after it begins: please ensure you are on time.  

Thesis entitled:   TRANSFORMER MODELS FOR PROTEIN-GUIDED DRUG COMPOUND GENERATION: A COMPARISION OF AMINO ACID SEQUNCES, PRE-TRAINED PROTEIN EMBEDDINGS, SMILES, AND SELFIES

Abstract: Drug discovery is a time-consuming and costly process that notoriously suffers from low success rates. Increased availability of chemically relevant data and advances in machine learning techniques offer potential solutions for aiding in the drug development pipeline. This thesis explores the use of Transformers in the conditional generation of potential drug compounds from protein context. 

Building on previous research, this work implements four transformer models to take in protein information as input and generate potential binding compounds. Each model uses either SMILES or SELFIES string representations of compounds and amino acid sequences or pretrained EMS-2 protein embeddings as contextual input. These models are trained and compared in their ability to generate chemically feasible compounds that approximate the physiochemical properties of the training set and show binding potential specific to the contextual proteins. 

The utilization of SEFLIES increased compound validity and diversity but overall had a negative performance impact compared to their SMILES counterparts. Pretrained protein embeddings were shown to decrease validity but improved model performance despite no change to model structure or size. These results highlight the potential of transformer models paired with pretrained protein embeddings to enhance the drug discovery process with the generation of lead compounds from novel proteins without any fine-tuning or retraining.

Examining Committee:  
Chair: Dr. Jianhui Zhou, University of Northern British Columbia  
Supervisor: Dr. Fan Jiang, University of Northern British Columbia  
Committee Member: Dr. Sean Maurice, University of Northern British Columbia  
Committee Member: Dr. Alia Hamieh, University of Northern British Columbia  
External Examiner: Dr. Chow Lee, University of Northern British Columbia 

Contact Information

Graduate Administration in the Office of the Registrar, University of Northern British Columbia

Email: grad-office@unbc.ca
Web: https://www2.unbc.ca/graduate-programs