Prediction of protein structure by Alphafold.
Protein expression & solubility with AI
Bioengineers used rational design, directed evolution to get what they want in the past decades. However, thorough knowledge of enzyme structure and mechanism is necessary for rational design. Moreover, excessive time cost was required for directed evolution depending on the screening technology for better candidates.
Nowadays, artificial intelligence (AI) has been used in various fields, including bioinformatics. Based on the large database collected from previous research, enzyme predictors are invented for many different functions. The most mature application is structure predictors such like Alphafold, trRossetta which can almost catch the high similarity, making the protein structure easier to analyze.
When we mention protein solubility, SDS-PAGE is often applied to check the supernatant and whole cell and the ratio can be defined as the solubility of target. According to the previous research in our lab, we find there is interesting relationship between protein solubility and function. According to this phenomenon, Solubility predictors are also another noteworthy AI application in the protein region.
SDS-PAGE analysis for whole-cell (W) and soluble (S) proteins.
The DeepSol predictor used deep learning which is called convolutional neural network (CNN) and combine with the sequence and structure information making the accuracy as high as 0.77. Due to the limitation of one-dimension of structure information, GraphSol used new structure-aware method to predict solubility by graph convolutional network (GCN). Although innovative predictors based on deep learning are invented recently, the disadvantage of this method is quite obvious. Except for the limitation of database, some research group think that the deep learning would cover some information behind the sequence. For example, SoluProt combine the different database and clean the database by their domain knowledge to make the training and test set more balance. They also used gradient boosting machine (GBM) model to calculate instead of the deep learning model.
AI predictor still has some limitation in protein region. As the database gradually increasing, the accuracy of predictor will be enough to use for analyzing. The wet experiment in our lab is really mature. Bringing in the dry method will make the experiment easier and timesaving.