Thesis Defence: Dylan Fossl (Master of Science in Computer Science)
You are encouraged to attend the defence. The details of the defence and attendance information is included below:
Date: November 19, 2024
Time: 11:30 AM to 1:30 PM (PT)
Defence mode: Hybrid
In-Person Attendance: Senate Chambers, UNBC Prince George Campus
Virtual Attendance: via Zoom
LINK TO JOIN: Please contact the Office of Graduate Administration regarding remote attendance for online defences.
To ensure the defence proceeds with no interruptions, please mute your audio and video on entry and do not inadvertently share your screen. The meeting will be locked to entry 5 minutes after it begins: please ensure you are on time.
Thesis entitled: TRANSFORMER MODELS FOR PROTEIN-GUIDED DRUG COMPOUND GENERATION: A COMPARISION OF AMINO ACID SEQUNCES, PRE-TRAINED PROTEIN EMBEDDINGS, SMILES, AND SELFIES
Abstract: Drug discovery is a time-consuming and costly process that notoriously suffers from low success rates. Increased availability of chemically relevant data and advances in machine learning techniques offer potential solutions for aiding in the drug development pipeline. This thesis explores the use of Transformers in the conditional generation of potential drug compounds from protein context.
Building on previous research, this work implements four transformer models to take in protein information as input and generate potential binding compounds. Each model uses either SMILES or SELFIES string representations of compounds and amino acid sequences or pretrained EMS-2 protein embeddings as contextual input. These models are trained and compared in their ability to generate chemically feasible compounds that approximate the physiochemical properties of the training set and show binding potential specific to the contextual proteins.
The utilization of SEFLIES increased compound validity and diversity but overall had a negative performance impact compared to their SMILES counterparts. Pretrained protein embeddings were shown to decrease validity but improved model performance despite no change to model structure or size. These results highlight the potential of transformer models paired with pretrained protein embeddings to enhance the drug discovery process with the generation of lead compounds from novel proteins without any fine-tuning or retraining.
Examining Committee:
Chair: Dr. Jianhui Zhou, University of Northern British Columbia
Supervisor: Dr. Fan Jiang, University of Northern British Columbia
Committee Member: Dr. Sean Maurice, University of Northern British Columbia
Committee Member: Dr. Alia Hamieh, University of Northern British Columbia
External Examiner: Dr. Chow Lee, University of Northern British Columbia
Contact Information
Graduate Administration in the Office of the Registrar, University of Northern British Columbia
Email: grad-office@unbc.ca
Web: https://www2.unbc.ca/graduate-programs