vision transformer
PulseAugur coverage of vision transformer — every cluster mentioning vision transformer across labs, papers, and developer communities, ranked by signal.
16 day(s) with sentiment data
-
New anomaly detection uses vision transformers for autonomous driving
Researchers have developed a new anomaly detection method for autonomous driving that uses pre-trained vision transformer embeddings. This approach models normality from a single reference image, avoiding the need for e…
-
CutMix training protocol induces spatial locality in Vision Transformers
Researchers have found that specific training techniques can encourage spatial locality in Vision Transformers. By using a 'Modern' protocol involving data augmentation like CutMix and ColorJitter, along with label smoo…
-
LESSViT architecture improves hyperspectral model generalization across sensors
Researchers have developed LESSViT, a novel architecture for hyperspectral imagery that addresses the challenge of generalizing models across different sensors. This Low-rank Efficient Spatial-Spectral ViT uses a struct…
-
TokenMask improves vision transformer segmentation efficiency
Researchers have developed TokenMask, a novel approach for vision transformer segmentation that bypasses the need for explicit image-space reconstruction. This method computes mask logits directly from query-token affin…
-
New GLIA framework enhances Vision Transformer use in image quality assessment
Researchers have developed a new framework called the Global-Local Interaction Adapter (GLIA) to improve Blind Image Quality Assessment (BIQA). This method leverages pre-trained Vision Transformers by using a dual-strea…
-
VoxCor method enables training-free volumetric features for medical imaging
Researchers have developed VoxCor, a novel method for creating reusable volumetric feature representations from pre-trained 2D Vision Transformer models. This training-free approach combines triplanar inference with a w…
-
What-Where Transformer separates object appearance from location
Researchers have introduced the What-Where Transformer (WWT), a novel visual backbone designed to better separate object appearance from spatial location. This new architecture uses a slot-based design where tokens repr…
-
Diffusion augmentation boosts Bangla character recognition accuracy
Researchers have developed a confidence-guided diffusion augmentation method to improve the recognition of handwritten Bangla compound characters. This approach uses diffusion models to generate high-quality synthetic c…
-
Foundation model learns from Dutch satellite data for global benchmarks
Researchers have developed a new foundation model for high-resolution remote sensing data, specifically trained on satellite images of the Netherlands. This model combines Convolutional Neural Networks and Vision Transf…
-
LC4-DViT uses generative AI and transformers for accurate land-cover mapping
Researchers have developed LC4-DViT, a novel framework for land-cover classification using a deformable Vision Transformer. This approach combines generative data creation with a deformation-aware backbone to improve ac…
-
New framework fuses facial and physiological signals for better emotion recognition
Researchers have developed a new framework for video-based emotion recognition that combines facial expressions with physiological signals from remote photoplethysmography (rPPG). Their method uses prompt tuning to inte…
-
Researchers develop robust foundation model for conservation laws using recurrent Vision Transformers
Researchers have developed a new architecture that enhances Flux Neural Operators (Flux NO) by incorporating context through Recurrent Vision Transformers. This hypernetwork model extracts solution dynamics over time, e…
-
DART vision-language model offers comprehensive rope condition monitoring
Researchers have developed DART, a vision-language foundation model designed for comprehensive rope condition monitoring. This model integrates a Vision Transformer with Llama-3.2-3B-Instruct to handle the entire inspec…
-
Hebbian Fast Weights enhance Vision Transformers for few-shot character recognition
Researchers have developed a new approach to few-shot character recognition by integrating Hebbian Fast-Weight (HFW) modules into Vision Transformer architectures. This method aims to mimic biological neural systems' ab…
-
RD-ViT cuts data needs for segmentation, outperforming standard ViT with fewer parameters
Researchers have developed RD-ViT, a novel Recurrent-Depth Vision Transformer designed for semantic segmentation tasks. This architecture significantly reduces data dependence by using a single, shared transformer block…
-
OneTrackerV2 unifies multimodal visual tracking with Dual Mixture-of-Experts
Researchers have developed a new event-based visual object tracking framework that addresses limitations of existing methods by explicitly modeling event density variations across multiple temporal scales. This approach…
-
Researchers develop AI framework for fluid-structure interaction prediction
Researchers have developed a new machine learning framework for predicting fluid-structure interactions (FSI) over long periods on deforming meshes. The system integrates a graph neural operator with a vision Transforme…
-
New framework enhances 3D ocean temperature reconstruction using AI
Researchers have developed an adaptive framework using spatiotemporal clustering to reconstruct 3D ocean subsurface temperature from surface observations. This method integrates with deep learning models like DP-CNN, At…
-
Researchers adapt Vision Transformers for fMRI analysis using flat maps
Researchers have developed a new family of models called CortexMAE, which adapt Vision Transformers for analyzing functional MRI data by projecting 3D volumes into 2D flat maps. This approach, tested on over 2,000 hours…
-
AI models advance plant disease detection with new datasets and efficient distillation
Researchers have developed new methods for plant leaf disease classification to aid in early detection and treatment. One approach involves training a new base model using the DenseNet201 architecture on a custom datase…