Co-authors: Yvonne M. Geurts, Marcel Verheij, Henk van der Poel, Lonneke van de Poll-Franse, Iris Walraven
As a part of the SPACE project, we are trying to forecast the prostate specific antigen (PSA) trajectory of Prostate Cancer (PCa) patients after a prostatectomy. SPACE is a part of the PersOn consortium.
Background:
Current follow-up (FU) guidelines for prostate cancer patients after prostatectomy include frequent prostate-specific antigen (PSA) checks, which pose a high burden on patients and healthcare systems. Existing tools to optimize FU frequency cannot dynamically integrate longitudinal PSA data, limiting clinical use. We created a methodology to map sparse longitudinal PSA data to functions for dynamically forecasting PSA trajectories during FU.
Methods:
We used data from the prospective Dutch Prostate Cancer Network, including 832 patients treated with prostatectomy from 2005 to 2021. We first performed a functional Principal Component Analysis (fPCA) to capture PSA data variance in multiple Eigenfunctions and fPCA scores. Next, an ensemble regression model was trained to predict individual fPCA scores using clinical variables (e.g., age and Gleason score). Last, PSA trajectories of patients were forecasted by combining the population Eigenfunctions and predicted fPCA scores at the first FU check.
Results:
Median age and PSA level at the time of prostatectomy were 64 years (IQR 59-68) and 8.0 ng/mL (IQR 5.6-11.7), respectively. Median follow-up was 3.2 years (IQR 1.7-5.0) with 6 PSA tests (IQR 4-10). Preliminary results showed that fPCA could describe the PSA trajectory of patients, capturing 97% of variance in the first Eigenfunction. However, ensemble model performance was low with the best regressor in the model explaining 10% of variance.
Conclusion:
Preliminary results showed that fPCA could capture data variance for dynamic forecasting of PSA trajectories. Next steps include improving model performance and assessing to what extent the trajectories can reduce PSA test frequency in the clinic.
During my time at Boehringer Ingelheim, I worked in the Bioprocess Development Biologicals department. My responsibilities were designing and executing lab experiments, after which I wrote scripts to analyze and model the output data. The abstract of this project is shown below.
Biopharmaceuticals like antibodies and viral vectors are extremely potent medicals due to their high specificity and efficacy. While antibodies are already produced on a large scale, the production of novel viral therapeutics is more complicated. These viral vectors and vaccines require multiple post translational modifications which are not feasible in the current internal production platforms in the human pharma business unit of Boehringer Ingelheim. For these purposes, HEK293 cells will build the basis for a suitable platform. This cell line from human origin allows for production of these medicines. To create this platform, a thorough understanding of the process and cell metabolism is required. In this project, a model is created to describe and predict the behavior of HEK293 cells using Metabolic Flux Analyses. This model can also be used for optimizing media formulations, by identifying bottlenecks in the metabolism of the cells, thereby majorly increasing development speed and success. The model is tested and verified in a Design of Experiment setup using the ambr®15 bioreactor. Investigated factors are the seeding cell density, glucose limit, glutamine source, glutamine source concentration and tyrosine concentration of the growth medium. The seeding cell density was found to be highly important for controlling and shifting the metabolism of the cells into a stationary phase. The tyrosine and glutamine source concentrations were also relevant for more specific reactions as alanine transaminase. This model provides a basis for more in depth research on the differences between growth phases and eventually implementing virus production.
For my Master thesis, I worked on uncovering the underlaying characteristics of thermophilic bacteriophages. The focus for this project was on the surfaces of structural proteins, for which AlphaFFold structures were used. The abstrat of the project as well as the full thesis is shown below.
Phages, viruses that infect bacteria, are found in a wide range of environments and have developed advanced strategies to survive in these surroundings. To get a better understanding of their stability under thermal stress, this project performed analyses of the strategies phages use to withstand heat. Structural phage protein surfaces were classified and compared with each other. This gave insight into the wide diversity of these proteins and was used to predict thermostability using machine learning models. Using two ways of assessing thermostability, random forest models were created for proteins separated by structural class. Proteins were characterized using structural features such as the compactness orsurface charge density of the protein and using sequential features retrieved using the deep learning embedding of UniRep. 75,567 structures of proteins were retrieved from the novel AlphaFold database and were checked for inaccuracies using a custom filtering pipeline, filtering out 22,843 low confidence entries and 23,454 loose structures. Model performance was found to inversely correlate with protein class diversity, indicating that within protein classes different strategies are used to withstand thermal stress. Combining different classes in one model led to lower predictive performance, confirming the high diversity between phage protein classes. The best performing model with an F1 score of 0.52 used structural features and 16S rRNA GC% estimated temperatures for the shaft class. This is far better than forced positive classification (F1=0.08) and showed the importance 3 of charged and turn surface residues in shaft proteins for thermostability. The use of phages in phage therapy to battle antibiotic-resistant bacteria and medicine delivery through phage design are very promising. However, problems regarding preparation and stabilization currently complicate the implementation of these phage applications. The novel characterizations of phage proteins in this project can be used to more accurately depict phages for phage therapy & design.
Click here for the full thesis.This bachelor thesis from 2019 focussed on the 3D structure of Complex Core Coacervate Micelles. The project used Molecular Dynamics simulations and a little bit of labwork to support the modelling findings. A brief abstract and the full thesis can be found below.
Complex Coacervate Core Micelles (C3Ms) are molecular structures with a prosperous future in many applications. They can play a role in water purification, have antifouling properties and can be involved in encapsulating processes. They consist of multiple polyelectrolytes, which form a complex coacervate core and a neutral corona. The combination of these to domains allows for binding polar (bio)molecules while remaining uncharged for transport. In this project, the effect of the stoichiometry on the 3D structure is analyzed, with the stoichiometry being the ratio between positive and negatively charged electrolytes. This is done using Langevin Dynamics simulations. These simulations showed a clear correlation between the stoichiometry and the size of the particles, especially at high polyelectrolyte lenghts. These large polyelectrolytes result in large micelles at a low f+ ratio, being a surplus of negatively charged electrolytes. When this ratio was more even, these large micelles did not form
Click here for the full thesis.In this project, I worked in an interdisciplinary team for SensUR Health. Together with Nutrition & Health students, we designed a concept for a lab-on-a-chip to predict individual recovery patterns for athletes using biomarkers. As the project was confidential, I'm not allowed to share more information here.
For the course Advanced Bioinformatics, I reran the analysis of a paper to see if we could find the same results. Next to this, we used their data to get an insight in the specific pathways expressed in response to bacterial inoculation. The abstract and results are shown below.
Bacteria can form a dangerous threat to plants. To protect itself, a plant can use many different molecular responses to protect itself from these hazardous pathogens. Why specific molecular pathways are expressed however, is currently not well understood. In this project we look at a paper by (Maier et al., 2021) and use its RNA sequence data of Arabidopsis Thaliana plants to rerun their data analysis. We try to find if different bacteria eliciting a strong response constitute to the same pathways being expressed or not. Also, the so called GNSR genes from the paper are looked into, and we try to figure out if these are actually consistently expressed in the different samples. Our results show that similar pathways are expressed in all samples. One of these pathways is the phenylpropanoid pathway, involved in lignification of the cell wall. However, some differences between samples can be found. It is still uncertain why this is the case as these differences are not caused by the phylum or pathogenicity of the bacteria.
Click here for the full report.In this project, we used Atomistic Modelling to re-enact the behavior of focal adhesion complexes to the extracellur matrix of mammalian cells. We found that the breakage pattern of these complexes is non random, following a domino effect from an initial point of failure. At the end of the project, we made a poster which won the Advanced Soft Matter poster prize. The poster can be found here.
In this project, Molecular Dynamics simulations were used to find a stable ice nucleating protein. Pyrosetta software was used to iterate over different configurations for a backbone and specific motifs. The abstract and report can be found below.
Ice binding proteins have many applications, ranging from snow cannons to microvalves in microtubing. However, these proteins are currently impossible to synthesize. Specific bacteria can transport these proteins on their membrane, but creating a pure ice binding protein is impossible. This is due to the proteins having high amounts of 𝛽-helices, which are very unstable in synthesis. In this project, ice binding proteins are synthesized on a much more stable back bone, 𝛼-helices. An ice binding motif (TxxxAxxxAxx)n is bound to the 𝛼 helix and is as such far easier to stabilize. Now, these proteins can be synthesized and can be used in this wide range of applications, without the hassle of dealing with living bacteria. A stable structure containing the ice binding motif was found during the design phase. Multiple configurations were found with low score values. The coil radius was found to be the most impactful parameter, while the twist and the phase of the protein seem to have less effect. Three local minima were found for the radius, these are 5.2-6 Å, 6.3-6.9 Å, 6.9-7.5 Å. The effect of the larger radii was a decrease in attraction between the three chains. Meanwhile, larger, more hydrophobic amino acids could be sampled in the core, resulting in a lower score. However, when looking at the fold and dock step, the likelihood of creating this structure from scratch is very low. Three different sequences were looked at, these all resulted in a slightly different structure with RMSD = 1.5 Å. For all 3, another completely different structure was present at RMSD = 8 or 10 Å. This indicates that the probability of the desired structure forming in the lab is very low. In the future, a broader range of input parameters must be taken in account. This could be a higher number of chains, fixing hydrogen bonds in the core or other parameters.
Click here for the full report.In this personal project, I created an algorithm using data from procyclingstats.com to find the ultimate fantasy cycling team for Scorito. The results for the Tour de france 2023 can be found here.
First, all points for each rider is retrieved using custom dictionaries and the results from ProCyclingStats.com. Next, an algorithm iterates over all possible teams which fit with the budget, trying every selection for each day with that team. The highest scoring team is stored. In the output file, every stage is printed with the 9 selected riders including a captain with their scored points. At the end, the total amount of points scored in the ideal team for each rider is printed.
In this project, we designed a data management system in Microsoft Access for the startup Urban Funghi to store and analyze the data of their Oyster Mushroom production process. SQL queries were used to retrieve their data and to quickly gete insight in the underlaying relations. As the project is confidential, I cannot share in depth details on their production setup and the database