Hanoona Rasheed's picture

2 7 2

Hanoona Rasheed

Hanoona

·

https://www.hanoonarasheed.com/

hanoonaR

AI & ML interests

I’m currently working on Multi-Modal Transformers in Computer Vision Applications.

Recent Activity

authored a paper 17 days ago

SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications

authored a paper 17 days ago

MaPLe: Multi-modal Prompt Learning

authored a paper 17 days ago

VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding

View all activity

Organizations

authored 9 papers 17 days ago

SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications

Paper • 2303.15446 • Published Mar 27, 2023 • 1

MaPLe: Multi-modal Prompt Learning

Paper • 2210.03117 • Published Oct 6, 2022

VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding

Paper • 2406.09418 • Published Jun 13, 2024 • 1

Fine-tuned CLIP Models are Efficient Video Learners

Paper • 2212.03640 • Published Dec 6, 2022 • 1

Perception Encoder: The best visual embeddings are not at the output of the network

Paper • 2504.13181 • Published Apr 17 • 34

PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding

Paper • 2504.13180 • Published Apr 17 • 19

VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos

Paper • 2506.05349 • Published Jun 5 • 24

A Culturally-diverse Multilingual Multimodal Video Benchmark & Model

Paper • 2506.07032 • Published Jun 8

Video-CoM: Interactive Video Reasoning via Chain of Manipulations

Paper • 2511.23477 • Published 27 days ago • 2

upvoted a paper 17 days ago

Video-CoM: Interactive Video Reasoning via Chain of Manipulations

Paper • 2511.23477 • Published 27 days ago • 2

updated 3 datasets about 1 month ago

Hanoona/think_video

Updated Nov 21 • 416

Hanoona/think_video

Updated Nov 21 • 416

Hanoona/think_video

Updated Nov 21 • 416