Technology

vision-language models

Tracked in 14 AFBytes stories. First seen May 28, 2026. Last seen Jun 02, 2026.

Recent coverage

[2606.01847] The Lie We Tell: Correcting the Euclidean Fallacy in Vision Language Action Policies via Score Matching on Tangent Space

arxiv.org · Jun 2, 2026 04:00 UTC

[2606.01847] The Lie We Tell: Correcting the Euclidean Fallacy in Vision Language Action Policies via Score Matching on Tangent Space

Abstract page for arXiv paper 2606.01847: The Lie We Tell: Correcting the Euclidean Fallacy in Vision Language Action Policies via Score Matching on Tangent Spa...

science tech

[2606.02273] Vision-language Models for Driver Monitoring Systems: A Driver Activity Description Dataset

arxiv.org · Jun 2, 2026 04:00 UTC

[2606.02273] Vision-language Models for Driver Monitoring Systems: A Driver Activity Description Dataset

Abstract page for arXiv paper 2606.02273: Vision-language Models for Driver Monitoring Systems: A Driver Activity Description Dataset

science tech

[2606.01612] Self-Improving Small Object Grounding in LVLMs

arxiv.org · Jun 2, 2026 04:00 UTC

[2606.01612] Self-Improving Small Object Grounding in LVLMs

Abstract page for arXiv paper 2606.01612: Self-Improving Small Object Grounding in LVLMs

science tech

[2605.30713] Diversity Matters: Revisiting Test-Time Compute in Vision-Language Models

arxiv.org · Jun 1, 2026 04:00 UTC

[2605.30713] Diversity Matters: Revisiting Test-Time Compute in Vision-Language Models

Abstract page for arXiv paper 2605.30713: Diversity Matters: Revisiting Test-Time Compute in Vision-Language Models

science tech

[2605.30716] Simple Token-Efficient Vision-Language Model for Case-level Pathology Synoptic Report Generation

arxiv.org · Jun 1, 2026 04:00 UTC

[2605.30716] Simple Token-Efficient Vision-Language Model for Case-level Pathology Synoptic Report Generation

Abstract page for arXiv paper 2605.30716: Simple Token-Efficient Vision-Language Model for Case-level Pathology Synoptic Report Generation

science tech

[2605.31349] FBHM: Functional Benchmarking and Steering of VLMs for Hateful Meme Detection

arxiv.org · Jun 1, 2026 04:00 UTC

[2605.31349] FBHM: Functional Benchmarking and Steering of VLMs for Hateful Meme Detection

Abstract page for arXiv paper 2605.31349: FBHM: Functional Benchmarking and Steering of VLMs for Hateful Meme Detection

science tech

[2605.31196] Probing Collision Grounding in Vision-Language Models for Safe Human-Robot Collaboration

arxiv.org · Jun 1, 2026 04:00 UTC

[2605.31196] Probing Collision Grounding in Vision-Language Models for Safe Human-Robot Collaboration

Abstract page for arXiv paper 2605.31196: Probing Collision Grounding in Vision-Language Models for Safe Human-Robot Collaboration

science tech

[2605.31556] Vision-Language Models Suppress Female Representations Under Ambiguous Input

arxiv.org · Jun 1, 2026 04:00 UTC

[2605.31556] Vision-Language Models Suppress Female Representations Under Ambiguous Input

Abstract page for arXiv paper 2605.31556: Vision-Language Models Suppress Female Representations Under Ambiguous Input

science tech

[2605.29438] ElegantVLA: Learning When to Think for Efficient Vision-Language-Action Models

arxiv.org · May 29, 2026 04:00 UTC

[2605.29438] ElegantVLA: Learning When to Think for Efficient Vision-Language-Action Models

Abstract page for arXiv paper 2605.29438: ElegantVLA: Learning When to Think for Efficient Vision-Language-Action Models

science tech

[2605.29585] World Models in Words: Auditing Physical State-Transition Commitments in Vision-Language Models

arxiv.org · May 29, 2026 04:00 UTC

[2605.29585] World Models in Words: Auditing Physical State-Transition Commitments in Vision-Language Models

Abstract page for arXiv paper 2605.29585: World Models in Words: Auditing Physical State-Transition Commitments in Vision-Language Models

science tech

[2605.29881] Mitigating Hallucination in Vision-Language Models through Barrier-Regulated Adaptive Closed-form Steering

arxiv.org · May 29, 2026 04:00 UTC

[2605.29881] Mitigating Hallucination in Vision-Language Models through Barrier-Regulated Adaptive Closed-form Steering

Abstract page for arXiv paper 2605.29881: Mitigating Hallucination in Vision-Language Models through Barrier-Regulated Adaptive Closed-form Steering

science tech

[2605.27894] Towards Unified Vision-Language Models with Incomplete Multi-Modal Inputs

arxiv.org · May 28, 2026 04:00 UTC

[2605.27894] Towards Unified Vision-Language Models with Incomplete Multi-Modal Inputs

Abstract page for arXiv paper 2605.27894: Towards Unified Vision-Language Models with Incomplete Multi-Modal Inputs

science tech

[2605.28051] Beyond Surrogate Gradients: Fully Differentiable Token Pruning for Vision-Language Models

arxiv.org · May 28, 2026 04:00 UTC

[2605.28051] Beyond Surrogate Gradients: Fully Differentiable Token Pruning for Vision-Language Models

Abstract page for arXiv paper 2605.28051: Beyond Surrogate Gradients: Fully Differentiable Token Pruning for Vision-Language Models

science tech

[2605.28346] When Discourse Pressures Conflict: Information Structure in Vision-Language Model Outputs

arxiv.org · May 28, 2026 04:00 UTC

[2605.28346] When Discourse Pressures Conflict: Information Structure in Vision-Language Model Outputs

Abstract page for arXiv paper 2605.28346: When Discourse Pressures Conflict: Information Structure in Vision-Language Model Outputs

science tech