Nicholas Moratelli

I am a final-year Ph.D. candidate in Artificial Intelligence Engineering at the AImageLab research group under the supervision of Professor Rita Cucchiara.

My research focuses on advancing the field of Artificial Intelligence through the development of Large Vision Language Models, Vision-and-Language Foundation Models, and Retrieval-Augmented Generation. I've published at top conferences like CVPR, ICLR, ACL, and BMVC, and recently joined Amazon Science in Cambridge, UK, as a Research Intern.

When I'm not coding or writing papers, you can probably find me at a conference somewhere in the world, coffee in hand.

Latest News

Award
May, 2025

Outstanding Reviewer

Honored to be recognized as an Outstanding Reviewer for CVPR 2025.

Amazon
May, 2025

Joining Amazon as a Research Intern

I'm thrilled to share that I will be joining Amazon Science in Cambridge, UK, for a 6-month internship as an Applied Research Intern.

Singapore
April, 2025

Attended ICLR 2025

I attended The Thirteenth International Conference on Learning Representations 2025 in Singapore.

LLaVA-MORE
April, 2025

New paper alert

We introduce LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning.

CVPR Conference
February, 2025

Paper Accepted for publication @ CVPR 2025

My work on "Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering" has been accepted at The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2025!

ICLR Conference
January, 2025

Paper Accepted for publication @ ICLR 2025

My work on "Causal Graphical Models for Vision-Language Compositional Understanding" has been accepted at The Thirteenth International Conference on Learning Representations 2025!

Kolkata
December, 2024

Attended ICPR 2024

I attended The international Conference on Pattern Recognition 2024 in Kolkata, India.

Glasgow
November, 2024

Attended BMVC 2024

I attended The British Machine Vision Conference 2024 in Glasgow, United Kingdom.

Milan
October, 2024

Attended ECCV 2024

I attended The European Conference on Computer Vision 2024 in Milan, Italy.

Summer School on Signal Processing
September, 2024

Attended Summer School on Signal Processing

I participated in The 2024 IEEE-EURASIP Summer School on Signal Processing in Capri, Italy.

ECCV Workshop
August, 2024

Paper Accepted for publication @ ECCV Workshops 2024

My work on "Personalizing Multimodal Large Language Models for Image Captioning: An Experimental Analysis" has been accepted at The European Conference on Computer Vision Workshops 2024!

Publications

CVPR 2025

Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering

F. Cocchi*, N. Moratelli*, M. Cornia, L. Baraldi, R. Cucchiara

In Conference on Computer Vision and Pattern Recognition, 2025

ICLR 2025

Causal Graphical Models for Vision-Language Compositional Understanding

F. Parascandolo, N. Moratelli, E. Sangineto, L. Baraldi, R. Cucchiara

In International Conference on Learning Representations 2025

BMVC 2024
Oral Presentation

Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization

N. Moratelli*, D. Caffagni*, M. Cornia, L. Baraldi, R. Cucchiara

In British Machine Vision Conference 2024

CVPR Workshops 2024

Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs

D. Caffagni*, F. Cocchi*, N. Moratelli*, S. Sarto*, M. Cornia, L. Baraldi, R. Cucchiara

In Conference on Computer Vision and Pattern Recognition Workshops, 2024

ACL 2024

The Revolution of Multimodal Large Language Models: A Survey

D. Caffagni*, F. Cocchi*, L. Barsellotti, N. Moratelli*, S. Sarto*, L. Baraldi*, L. Baraldi, M. Cornia, R. Cucchiara

In Findings of the Association for Computational Linguistics, 2024

ICPR 2024
Oral Presentation

Fluent and Accurate Image Captioning with a Self-Trained Reward Model

N. Moratelli, M. Cornia, L. Baraldi, R. Cucchiara

In International Conference on Pattern Recognition, 2024

Under review

Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and Training

S. Sarto*, N. Moratelli*, M. Cornia, L. Baraldi, R. Cucchiara

Under review at a top tier journal

ECCV Workshops 2024

Personalizing Multimodal Large Language Models for Image Captioning: An Experimental Analysis

D. Bucciarelli, N. Moratelli, M. Cornia, L. Baraldi, R. Cucchiara

In European Conference on Computer Vision and Pattern Recognition, 2024

Sensors 2023

Fashion-oriented image captioning with external knowledge retrieval and fully attentive gates

N. Moratelli*, M. Barraco*, D. Morelli, M. Cornia, L. Baraldi, R. Cucchiara

In Sensors MDPI, 2023

Intelligent Systems 2024

Are Learnable Prompts the Right Way of Prompting? Adapting Vision-and-Language Models with Memory Optimization

N. Moratelli, M. Barraco, M. Cornia, L. Baraldi, R. Cucchiara

In IEEE Intelligent Systems, 2024

Projects

Project Image
Multimodal LLM

LLAVA-MORE

134

A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning.

Get In Touch

Interested in collaboration?

I'm always open to discussing research ideas, potential collaborations, or opportunities to apply AI in innovative ways.

Email Me