Abstract
In this study, we explore the alignment of multimodal representations between large language models (LLMs) and geometric deep models (GDMs) in the protein domain. We comprehensively evaluate three LLMs with four protein-specialized GDMs. Our work examines alignment factors from both model and protein perspectives, identifying challenges in current alignment methodologies and proposing strategies to improve the alignment process. Experimental results reveal that GDMs incorporating both graph and 3D structural information align better with LLMs, larger LLMs demonstrate improved alignment capabilities, and protein rarity significantly impacts alignment performance. We also find that increasing GDM embedding dimensions, using two-layer projection heads, and fine-tuning LLMs on protein-specific data substantially enhance alignment quality. Last, we demonstrate that improved alignment correlates with better downstream performance and reduced hallucination in protein-focused multimodal LLMs.
| Original language | English (US) |
|---|---|
| Article number | 101227 |
| Journal | Patterns |
| Volume | 6 |
| Issue number | 5 |
| DOIs | |
| State | Published - May 9 2025 |
| Externally published | Yes |
All Science Journal Classification (ASJC) codes
- General Decision Sciences
Keywords
- multimodal AI
- protein
- representation alignment