Nicholas Moratelli

Latest News

May, 2026

Joining Tether as a Senior AI Research Engineer

I joined Tether as a Senior AI Research Engineer, where I work on multimodal and video foundation models across post-training, alignment, and scalable model adaptation.

April, 2026

Ph.D. Completed

I completed my Ph.D. in Artificial Intelligence at the AImageLab, focusing on scalable multimodal foundation models, knowledge-grounded reasoning, and vision-language learning.

April, 2026

Paper Accepted for publication @ ACL 2026 (Main Conference)

My work on "Benchmarking Deflection and Hallucination in Large Vision-Language Models" has been accepted at The 64th Annual Meeting of the Association for Computational Linguistics 2026!

July, 2025

Paper Accepted for publication @ BMVC 2025

My work on "Mitigating Hallucinations in Multimodal LLMs via Object-aware Preference Optimization" has been accepted at British Machine Vision Conference 2025!

July, 2025

Paper Accepted for publication in International Journal of Computer Vision (IJCV) 2025

My work on "Positive-augmented contrastive learning for vision-and-language evaluation and training" has been accepted at International Journal of Computer Vision 2025!

July, 2025

Paper Accepted for publication @ ICCV Workshops 2025

My work on "LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning" has been accepted at The International Conference on Computer Vision Workshops 2025!

May, 2025

Outstanding Reviewer

Honored to be recognized as an Outstanding Reviewer for CVPR 2025.

May, 2025

Joining Amazon as an Applied Scientist Intern

I joined Amazon Science in Cambridge, UK, as an Applied Scientist Intern, working on multimodal post-training and retrieval-augmented generation for Amazon’s Nova foundation models.

April, 2025

Attended ICLR 2025

I attended The Thirteenth International Conference on Learning Representations 2025 in Singapore.

February, 2025

Paper Accepted for publication @ CVPR 2025

My work on "Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering" has been accepted at The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2025!

January, 2025

Paper Accepted for publication @ ICLR 2025

My work on "Causal Graphical Models for Vision-Language Compositional Understanding" has been accepted at The Thirteenth International Conference on Learning Representations 2025!

December, 2024

Attended ICPR 2024

I attended The international Conference on Pattern Recognition 2024 in Kolkata, India.

November, 2024

Attended BMVC 2024

I attended The British Machine Vision Conference 2024 in Glasgow, United Kingdom.

October, 2024

Attended ECCV 2024

I attended The European Conference on Computer Vision 2024 in Milan, Italy.

September, 2024

Attended Summer School on Signal Processing

I participated in The 2024 IEEE-EURASIP Summer School on Signal Processing in Capri, Italy.

August, 2024

Paper Accepted for publication @ ECCV Workshops 2024

My work on "Personalizing Multimodal Large Language Models for Image Captioning: An Experimental Analysis" has been accepted at The European Conference on Computer Vision Workshops 2024!

August, 2024

Paper Accepted for publication @ ICPR 2024 for Oral Presentation

My work on "Fluent and Accurate Image Captioning with a Self-Trained Reward Model" has been accepted at The international Conference on Pattern Recognition 2024!

July, 2024

Paper Accepted for publication @ BMVC 2024 for Oral Presentation

My work on "Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization" has been accepted at The British Machine Vision Conference 2024 for Oral presentation!

July, 2024

Paper Accepted for publication @ CVPR Workshops 2024

My work on "Wiki-LLaVA: Hierarchical Multimodal Retrieval Augmented Generation" has been accepted at The IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops 2024!

May, 2024

Paper Accepted for publication @ ACL findings 2024

My work on "The Revolution of Multimodal Large Language Models: A Survey" has been accepted at The 62nd Annual Meeting of the Association for Computational Linguistics Findings 2024!

September, 2023

Attended ELLIS Summer School

I participated in The ELLIS Summer School on Large-Scale AI for Research and Industry in Modena, Italy.

September, 2023

Attended VISMAC Summer School

I participated in The International Summer School on Machine Vision 2023 in Padova, Italy.

May, 2024

Paper Accepted for publication in Sensors Journal

My work on "Fashion-oriented image captioning with external knowledge retrieval and fully attentive gates" has been published in Sensors!

March, 2024

Paper Accepted for publication in IEEE Intelligent Systems Journal

My work on "Are Learnable Prompts the Right Way of Prompting? Adapting Vision-and-Language Models with Memory Optimization" has been accepted in IEEE Intelligent Systems!

November, 2022

Joined AImageLab as Ph.D. Student

I started my Ph.D. in Information and Communication Technologies (ICT) at AImageLab Laboratory.

Publications

ACL 2026

Benchmarking Deflection and Hallucination in Large Vision-Language Models

N. Moratelli, C. Davis, L. F. R. Ribeiro, B. Byrne, G. Iglesias

In Annual Meeting of the Association for Computational Linguistics, 2026

Read Paper

CVPR 2025

Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering

F. Cocchi*, N. Moratelli*, M. Cornia, L. Baraldi, R. Cucchiara

In Conference on Computer Vision and Pattern Recognition, 2025

Read Paper Project Page Code

ICLR 2025

Causal Graphical Models for Vision-Language Compositional Understanding

F. Parascandolo, N. Moratelli, E. Sangineto, L. Baraldi, R. Cucchiara

In International Conference on Learning Representations 2025

Read Paper Project Page Code

BMVC 2024

Oral Presentation

Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization

N. Moratelli*, D. Caffagni*, M. Cornia, L. Baraldi, R. Cucchiara

In British Machine Vision Conference 2024

Read Paper Code

CVPR Workshops 2024

Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs

D. Caffagni*, F. Cocchi*, N. Moratelli*, S. Sarto*, M. Cornia, L. Baraldi, R. Cucchiara

In Conference on Computer Vision and Pattern Recognition Workshops, 2024

Read Paper

ACL 2024

The Revolution of Multimodal Large Language Models: A Survey

D. Caffagni*, F. Cocchi*, L. Barsellotti, N. Moratelli*, S. Sarto*, L. Baraldi*, L. Baraldi, M. Cornia, R. Cucchiara

In Findings of the Association for Computational Linguistics, 2024

Read Paper

ICPR 2024

Oral Presentation

Fluent and Accurate Image Captioning with a Self-Trained Reward Model

N. Moratelli, M. Cornia, L. Baraldi, R. Cucchiara

In International Conference on Pattern Recognition, 2024

Read Paper

IJCV 2025

Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and Training

S. Sarto*, N. Moratelli*, M. Cornia, L. Baraldi, R. Cucchiara

In International Journal of Computer Vision, 2025

Read Paper Code

BMVC 2025

Mitigating Hallucinations in Multimodal LLMs via Object-aware Preference Optimization

A. Compagnoni, D. Caffagni, N. Moratelli, M. Cornia, L. Baraldi, R. Cucchiara

In British Machine Vision Conference 2025

Read Paper

ICCV Workshops 2025

LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning

F. Cocchi*, N. Moratelli*, D. Caffagni*, S. Sarto*, M. Cornia, L. Baraldi, R. Cucchiara

In International Conference on Computer Vision, 2025

Read Paper Code

ECCV Workshops 2024

Personalizing Multimodal Large Language Models for Image Captioning: An Experimental Analysis

D. Bucciarelli, N. Moratelli, M. Cornia, L. Baraldi, R. Cucchiara

In European Conference on Computer Vision and Pattern Recognition, 2024

Read Paper

Sensors 2023

Fashion-oriented image captioning with external knowledge retrieval and fully attentive gates

N. Moratelli*, M. Barraco*, D. Morelli, M. Cornia, L. Baraldi, R. Cucchiara

In Sensors MDPI, 2023

Read Paper

Intelligent Systems 2024

Are Learnable Prompts the Right Way of Prompting? Adapting Vision-and-Language Models with Memory Optimization

N. Moratelli, M. Barraco, M. Cornia, L. Baraldi, R. Cucchiara

In IEEE Intelligent Systems, 2024

Read Paper

Nicholas Moratelli

Latest News

Joining Tether as a Senior AI Research Engineer

Ph.D. Completed

Paper Accepted for publication @ ACL 2026 (Main Conference)

Paper Accepted for publication @ BMVC 2025

Paper Accepted for publication in International Journal of Computer Vision (IJCV) 2025

Paper Accepted for publication @ ICCV Workshops 2025

Outstanding Reviewer

Joining Amazon as an Applied Scientist Intern

Attended ICLR 2025

Paper Accepted for publication @ CVPR 2025

Paper Accepted for publication @ ICLR 2025

Attended ICPR 2024

Attended BMVC 2024

Attended ECCV 2024

Attended Summer School on Signal Processing

Paper Accepted for publication @ ECCV Workshops 2024

Paper Accepted for publication @ ICPR 2024 for Oral Presentation

Paper Accepted for publication @ BMVC 2024 for Oral Presentation

Paper Accepted for publication @ CVPR Workshops 2024

Paper Accepted for publication @ ACL findings 2024

Attended ELLIS Summer School

Attended VISMAC Summer School

Paper Accepted for publication in Sensors Journal

Paper Accepted for publication in IEEE Intelligent Systems Journal

Joined AImageLab as Ph.D. Student

Publications

Benchmarking Deflection and Hallucination in Large Vision-Language Models

Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering

Causal Graphical Models for Vision-Language Compositional Understanding

Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization

Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs

The Revolution of Multimodal Large Language Models: A Survey

Fluent and Accurate Image Captioning with a Self-Trained Reward Model

Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and Training

Mitigating Hallucinations in Multimodal LLMs via Object-aware Preference Optimization

LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning

Personalizing Multimodal Large Language Models for Image Captioning: An Experimental Analysis

Fashion-oriented image captioning with external knowledge retrieval and fully attentive gates

Are Learnable Prompts the Right Way of Prompting? Adapting Vision-and-Language Models with Memory Optimization

Projects

LLAVA-MORE

Get In Touch

Interested in collaboration?