Z. Liu

How Far Are Video Models from True Multimodal Reasoning? featured image

How Far Are Video Models from True Multimodal Reasoning?

Evaluating video models for true multimodal reasoning.

x.-zhang