Within 3 weeks of meeting, Fusemachines was able to onboard 1 PhD and 3 engineers to work with the client. The Data Scientist and engineers had instant communication during the integration process with the client’s internal team and project leaders. They worked to help solve protein prediction as well as extract information from research documents using natural language processing (NLP) to help corroborate the predictions.
As a second step, the client wanted to use NLP to comb through scientific journals and solidify the protein search based on the specific disease being analyzed and different predicted proteins. This would help speed up the process by proving the alignment of contemporary scientific literature with the findings.
To successfully predict proteins, Fuse engineers pre-processed the data to make sure there weren’t any inherent biases. To tackle the issue of dealing with unlabeled data, engineers used techniques to produce pseudo labels which help the AI model make better predictions. In short, this solution involved taking the “unknown” proteins, making predictions on them, then taking the proteins that have been predicted positive and feeding them into the model again for training, but this time as labeled examples.