Soroush Mehraban

Vision Transformer (ViT) Paper Explained Soroush Mehraban 3,068 1 год назад
HD-GCN (ICCV2023): Skeleton-Based Action Recognition Soroush Mehraban 1,733 1 год назад
Faster R-CNN: Faster than Fast R-CNN! Soroush Mehraban 7,884 2 года назад
Receptive Fields: Why 3x3 conv layer is the best? Soroush Mehraban 7,622 2 года назад
FastV: An Image is Worth 1/2 Tokens After Layer 2 Soroush Mehraban 464 8 месяцев назад
Swin Transformer - Paper Explained Soroush Mehraban 12,828 1 год назад
Relative Position Bias (+ PyTorch Implementation) Soroush Mehraban 4,159 1 год назад
PoseGPT (ChatPose): Chatting about 3D Human Pose Soroush Mehraban 798 10 месяцев назад
GLIGEN (CVPR2023): Open-Set Grounded Text-to-Image Generation Soroush Mehraban 411 6 месяцев назад
R-CNN: Clearly EXPLAINED! Soroush Mehraban 36,352 2 года назад
Autoregressive Image Generation without Vector Quantization Soroush Mehraban 386 1 месяц назад
Convolutional Block Attention Module (CBAM) Paper Explained Soroush Mehraban 7,558 1 год назад
Denoising Diffusion Null-Space Model (DDNM) - Method Explained Soroush Mehraban 320 54 года назад
Prompt-to-Prompt (P2P) image Editing - Method Explained Soroush Mehraban 275 54 года назад
Tent: Fully Test-time Adaptation by Entropy Minimization Soroush Mehraban 314 6 месяцев назад
DINO: Self-Supervised Vision Transformers Soroush Mehraban 3,331 1 год назад
MetaFormer is Actually What You Need for Vision Soroush Mehraban 1,061 1 год назад
ViTPose: 2D Human Pose Estimation Soroush Mehraban 3,731 1 год назад