COCO
PulseAugur coverage of COCO — every cluster mentioning COCO across labs, papers, and developer communities, ranked by signal.
-
Researchers unveil new stealthy backdoor attacks on AI models using diffusion and style features
Researchers have developed new methods for backdoor attacks on advanced AI models, specifically targeting Vision-Language Models (VLMs) and Diffusion Models (DMs). One approach, CBV, uses diffusion models to create natu…
-
Colinearity Decay trains vision Transformers for better low-bit quantization
Researchers have developed a new training technique called Colinearity Decay (CD) to make Vision Transformers (ViTs) more amenable to low-bit quantization. This method acts as a structural regularizer, penalizing alignm…
-
FractalMamba++ scales vision models across resolutions using Hilbert curves
Researchers have introduced FractalMamba++, an enhanced vision backbone designed to improve the performance of Mamba-based models, particularly with high-resolution inputs. This new architecture leverages the geometric …
-
New methods improve open-vocabulary object detection robustness and adaptation
Researchers have introduced several new methods to improve open-vocabulary object detection, a field that aims to identify arbitrary objects based on human prompts. One approach, EBOD, integrates a prompt-based detector…
-
Hyp2Former uses hyperbolic embeddings for open-set panoptic segmentation
Researchers have developed Hyp2Former, a novel framework for open-set panoptic segmentation that leverages hierarchical semantic similarities in hyperbolic space. This approach allows the model to better distinguish unk…
-
Object detection models show mixed robustness to quantization and input degradations
A new study investigates how post-training quantization (PTQ) affects the robustness of YOLO object detection models when faced with real-world input degradations like noise and blur. Researchers evaluated various preci…
-
GPT-4o and other multimodal models evaluated on computer vision tasks
A new paper evaluates how well multimodal foundation models, including GPT-4o and Gemini 1.5 Pro, perform on standard computer vision tasks. Researchers developed a prompt-chaining method to translate vision tasks into …
-
New DBAC metric measures and identifies bias amplification in image captions
Researchers have introduced a new metric called Directional Bias Amplification in Captioning (DBAC) to measure and identify how image captioning models worsen biases present in their training data. Unlike previous metri…
-
New dataset aids computer vision identification of parasitoid wasps
Researchers have introduced the Descriptor: Parasitoid Wasps and Associated Hymenoptera Dataset (DAPWH), a new image collection aimed at improving automated identification of crucial insect groups. The dataset comprises…
-
Researchers propose fuzzy logic for robust image recognition via knowledge discovery
Researchers have developed a novel method for enhancing image recognition robustness by integrating domain knowledge into deep neural networks. This approach introduces a Differentiable Knowledge Unit (DKU) that modulat…
-
Researchers find single hub text exploits vulnerabilities in CLIP cross-modal encoders
Researchers have identified a vulnerability in cross-modal encoders like CLIP, which map text and images into a shared embedding space. They discovered that a single "hub text" can generate high similarity scores with n…
-
ViCrop-Det improves small-object detection with adaptive spatial routing
Researchers have introduced ViCrop-Det, a novel framework designed to improve small-object detection in images without requiring additional training. This method utilizes Spatial Attention Entropy (SAE) derived from a m…
-
New metric T3S evaluates semantic similarity in low-level image processing
Researchers have introduced a new evaluation metric called Semantic Similarity Score (T3S) for low-level image processing tasks. This metric aims to assess whether the semantic content of an image is preserved after pro…
-
Diffusion models boost AI's vision for segmentation and anomaly detection
Researchers have developed DiCLIP, a new framework for weakly supervised semantic segmentation that enhances the capabilities of CLIP by integrating diffusion models. This approach addresses CLIP's limitations in dense …
-
HalalBench benchmark tackles OCR challenges for multilingual food packaging ingredient extraction
Researchers have introduced HalalBench, a new multilingual benchmark designed to evaluate Optical Character Recognition (OCR) performance specifically on food packaging ingredient labels. The benchmark addresses the uni…
-
New framework enhances federated cross-modal retrieval with missing modalities
Researchers have developed RCSR, a new framework designed to improve federated cross-modal retrieval, particularly when dealing with data heterogeneity and missing modalities across clients. The system utilizes a frozen…
-
New OVD method improves object detection with hierarchical consistency and unbiased objectness
Researchers have developed a new framework to improve open-vocabulary object detection (OVD), a technique that allows AI models to identify objects beyond their training data. The proposed method addresses inaccuracies …
-
BMD-45 dataset improves CCTV vehicle detection in developing cities
Researchers have introduced BMD-45, a new large-scale dataset designed to improve vehicle detection in urban traffic environments found in developing cities. This dataset contains over 45,000 images with 480,000 boundin…