Small proteins (<=100 amino acids) are challenging to identify and annotate computationally. However, given the important cellular roles these play across organisms including prokaryotes, researchers of NII have developed SProtFP which utilises machine learning to analyze the physicochemical properties of small proteins and assign them to specific categories like type 1 toxins, type 2 antitoxins, DNA-binding proteins, antimicrobial peptides, etc. When applied to the genomes from the human gut microbiome, this tool could uncover remote homologues of known small proteins and assign probable functions to uncharacterized proteins, highlighting its utility in annotating small proteins from large microbiome datasets, even when they share little similarity to known small protein sequences.
Reference:
Khanduja, A., & Mohanty, D. (2025). SProtFP: a machine learning-based method for functional classification of small ORFs in prokaryotes. NAR genomics and bioinformatics, 7(1), lqae186. https://doi.org/10.1093/nargab/lqae186