2020 |
英国布里斯托尔大学(Bristol) |
Multi-Modal Domain Adaptation for Fine-Grained Action Recognition |
|
域适应(分类) |
2020 |
合肥工业大学 |
Creating Something From Nothing: Unsupervised Knowledge Distillation for Cross-Modal Hashing |
|
跨模态检索(基于Hash) |
2020 |
德国弗莱堡大学(freiburg) |
Multimodal Future Localization and Emergence Prediction for Objects in Egocentric View With a Reachability Prior |
|
多模态预测 |
2020 |
法国自动化研究所(Inria) |
Cross-Modal Deep Face Normals With Deactivable Skip Connections |
|
跨模态生成 (均为图像) |
2020 |
清华大学 |
Monocular Real-Time Hand Shape and Motion Capture Using Multi-Modal Data |
|
姿态估计 |
2020 |
华中科技大学 & 北京大学 |
Semantically Multi-Modal Image Synthesis |
|
语义合成图像 (用不同语义来操控合成结果) |
2020 |
美国罗格斯大学 (Rutgers) |
Knowledge As Priors: Cross-Modal Knowledge Generalization for Datasets Without Superior Knowledge |
|
跨模态知识蒸馏 |
2020 |
南京理工大学 |
Cross-Modal Pattern-Propagation for RGB-T Tracking |
|
tracking (追踪) |
2020 |
fackbook & UC伯克利 |
Iterative Answer Prediction With Pointer-Augmented Multimodal Transformers for TextVQA |
|
文本视觉问答(TextVQA) |
2020 |
韩国科技院(Korea Advanced Institute of Science and Technolog)& 三星 (Samsung) |
Modality Shifting Attention Network for Multi-Modal Video Question Answering |
|
视觉问答VQA |
2020 |
港中文 & 商汤 联合实验室 |
A Local-to-Global Approach to Multi-Modal Movie Scene Segmentation |
|
分割 |
2020 |
韩国科技院(Korea Advanced Institute of Science and Technolog) |
Hi-CMD: Hierarchical Cross-Modality Disentanglement for Visible-Infrared Person Re-Identification |
|
person ReID |
2020 |
牛津大学 & Google & DeepMind |
Speech2Action: Cross-Modal Supervision for Action Recognition |
|
行为识别 |
2020 |
英国萨里大学 & 伦敦大学玛丽皇后学院 |
Solving Mixed-Modal Jigsaw Puzzle for Fine-Grained Sketch-Based Image Retrieval |
|
拼图游戏(图像检索) |
2020 |
中科院信息所 & 中科大 |
Referring Image Segmentation via Cross-Modal Progressive Comprehension |
|
分割 |
2020 |
中科院自动化所 |
Cross-Modal Cross-Domain Moment Alignment Network for Person Search |
|
跨模态检索 |
2020 |
中国科技大学 |
Vision-Dialog Navigation by Exploring Cross-Modal Memory |
|
视觉对话导航 |
2020 |
北京航空航天大学 |
A Real-Time Cross-Modality Correlation Filtering Method for Referring Expression Comprehension |
|
视觉理解 |
2020 |
中国科技大学 |
Multi-Modality Cross Attention Network for Image and Sentence Matching |
|
图像文本匹配 |
2020 |
美国Aptiv (汽车公司) |
nuScenes: A Multimodal Dataset for Autonomous Driving |
|
自动驾驶 |
2020 |
奔驰公司 |
Seeing Through Fog Without Seeing Fog: Deep Multimodal Sensor Fusion in Unseen Adverse Weather |
|
多传感器融合 |
2020 |
法国自动化研究所(Inria) |
xMUDA: Cross-Modal Unsupervised Domain Adaptation for 3D Semantic Segmentation |
|
图像分割 (3D点云和2D图像) |
2020 |
清华大学 |
IMRAM: Iterative Matching With Recurrent Attention Memory for Cross-Modal Image-Text Retrieval |
|
跨模态检索 |
2020 |
Facebook |
What Makes Training Multi-Modal Classification Networks Hard? |
|
多模态分类 |
2020 |
中科院计算所 |
Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text |
|
视觉问答 |
2020 |
电子科技大学 |
Universal Weighting Metric Learning for Cross-Modal Matching |
|
跨模态匹配 |
2020 |
Microsoft & 美国乔治亚理工 (Georgia Tech) |
MMTM: Multimodal Transfer Module for CNN Fusion |
|
多模态融合 |
2020 |
中国科技大学 |
Cross-Modality Person Re-Identification With Shared-Specific Feature Transfer |
|
person ReID |
2020 |
以色列特拉维夫大学(Tel Aviv) |
Unsupervised Multi-Modal Image Registration via Geometry Preserving Image-to-Image Translation |
|
多模态配准(图像配准) |
2020 |
上海交通大学 |
Where, What, Whether: Multi-Modal Learning Meets Pedestrian Detection |
|
行人检测 |
2020 |
美国加州理工学院 & Aptiv (汽车公司) |
CoverNet: Multimodal Behavior Prediction Using Trajectory Sets |
|
行为预测 |
2020 |
美国马里兰大学 (Maryland) |
EmotiCon: Context-Aware Multimodal Emotion Recognition Using Frege’s Principle |
|
情感分类 |
2020 |
Xpeng motors (中国电动汽车初创公司) |
Discriminative Multi-Modality Speech Recognition |
|
语音识别 (视频和语音) |
2020 |
浙江大学 |
MCEN: Bridging Cross-Modal Gap between Cooking Recipes and Dish Images with Latent Variable Model |
|
跨模态检索(食品检索) |
2020 |
Kakao Brain (韩国聊天软件公司) |
Hypergraph Attention Networks for Multimodal Learning |
|
多模态问答 |
2020 |
中科院软件所 |
End-to-End Adversarial-Attention Network for Multi-Modal Clustering |
|
多模态聚类 |
2020 |
|
Multimodal Categorization of Crisis Events in Social Media |
|
多模态分类 |
|
|
|
|
|
2019 |
沙特阿卜杜拉国王科技大学 |
Latent Filter Scaling for Multimodal Unsupervised Image-To-Image Translation |
|
image-to-image 无监督转换 |
2019 |
南加州大学 & 阿里巴巴(美国) |
Unsupervised Multi-Modal Neural Machine Translation |
|
基于图像的语言之间翻译无监督学习 |
2019 |
京东 (JD) |
A Dataset and Benchmark for Large-Scale Multi-Modal Face Anti-Spoofing |
|
人脸反欺诈 |
2019 |
香港中文大学 & 商汤 |
Improving the Performance of Unimodal Dynamic Hand-Gesture Recognition With Multimodal Training |
|
动态手势识别(3D视频) |
2019 |
香港中文大学 & 商汤 |
Improving Referring Expression Grounding With Cross-Modal Attention-Guided Erasing |
|
视觉理解 |
2019 |
Microsoft Cloud & AI |
Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval |
|
跨模态检索 |
2019 |
法国索邦大学 |
MUREL: Multimodal Relational Reasoning for Visual Question Answering |
|
视觉问答 |
2019 |
京东(JD) |
Heterogeneous Memory Enhanced Multimodal Attention Model for Video Question Answering |
|
视觉问答 |
2019 |
香港科大 |
ContextDesc: Local Descriptor Augmentation With Cross-Modality Context |
|
跨模态特征匹配 |
2019 |
香港大学 |
Cross-Modal Relationship Inference for Grounding Referring Expressions |
|
视觉理解 |
2019 |
美国匹兹堡大学 |
Cross-Modality Personalization for Retrieval |
|
跨模态检索 |
2019 |
加州大学巴拉拉分校&Microsoft |
Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation |
|
视觉语言导航(强化学习) |
2019 |
香港中文大学 & 商汤 Joint lab |
Dynamic Fusion With Intra- and Inter-Modality Attention Flow for Visual Question Answering |
|
视觉问答 |
2019 |
法国诺曼底卡昂大学 |
MFAS: Multimodal Fusion Architecture Search |
|
多模态融合架构搜索 (NAS) |
2019 |
德国弗莱堡大学 |
Overcoming Limitations of Mixture Density Networks: A Sampling and Fitting Framework for Multimodal Future Prediction |
|
未来预测 |
2019 |
美国罗切斯特大学 |
Hierarchical Cross-Modal Talking Face Generation With Dynamic Pixel-Wise Loss |
|
跨模态生成(语音生成视频) |
2019 |
Preferred Networks (日本丰田收购的独角兽AI公司) & 东京大学 |
Multimodal Explanations by Predicting Counterfactuality in Videos |
|
多模态解释(用文本解释图像分类结果的原因) |
2019 |
西北工业大学 (聂飞平组) |
Deep Multimodal Clustering for Unsupervised Audiovisual Learning |
|
多模态聚类 (图像和音频) |
2019 |
四川大学 |
Deep Supervised Cross-Modal Retrieval |
|
跨模态检索 (图像与文本) |
2019 |
加拿大马尼托巴大学 & 上海大学 |
Cross-Modal Self-Attention Network for Referring Image Segmentation |
|
基于文本描述的图像分割 |
2019 |
麻省理工人工智能lab (MIT CSAIL) |
Connecting Touch and Vision via Cross-Modal Prediction |
|
跨模态生成 (触觉和视觉) |
2019 |
香港城市大学 & 国立新加坡 |
R2GAN: Cross-Modal Recipe Retrieval With Generative Adversarial Network |
|
跨模态检索(图像和文本) |
2019 |
新加坡管理大学 |
Learning Cross-Modal Embeddings With Adversarial Networks for Cooking Recipes and Food Images |
|
跨模态生成(图像和文本) |
2019 |
纽约哥伦比亚大学 |
Multi-Level Multimodal Common Semantic Space for Image-Phrase Grounding |
|
视觉理解 (图像和文本) |
|
|
|
|
|
2018 |
苏黎世联邦理工(ETH Zurich)誉为欧洲第一名校(爱因斯坦) |
Cross-Modal Deep Variational Hand Pose Estimation |
|
跨模态生成 (均为图像) |
2018 |
facebook |
Stacked Latent Attention for Multimodal Reasoning |
|
视觉问答 |
2018 |
美国维拉诺瓦大学 |
Deep Sparse Coding for Invariant Multimodal Halle Berry Neurons |
|
神经元的改进 |
2018 |
西安电子科技大学 & 腾讯 |
Self-Supervised Adversarial Hashing Networks for Cross-Modal Retrieval |
|
跨模态检索 |
2018 |
希腊雅典国立技术大学 |
Multimodal Visual Concept Learning With Weakly Supervised Techniques |
|
视频理解(视频和文本) |
2018 |
新加坡南洋理工 & 阿里巴巴(杭州) |
Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval With Generative Models |
|
跨模态检索 (图像和文本) |
2018 |
中科院自动化所 |
M3: Multimodal Memory Modelling for Video Captioning |
|
跨模态生成(视频字幕生成) |
2018 |
牛津大学 |
Seeing Voices and Hearing Faces: Cross-Modal Biometric Matching |
|
跨模态检索(语音和人脸图像) |
2018 |
UC 伯克利 |
Multimodal Explanations: Justifying Decisions and Pointing to the Evidence |
|
视觉理解 (视觉问答) |
2018 |
西门子医疗 |
Translating and Segmenting Multimodal Medical Volumes With Cycle- and Shape-Consistency Generative Adversarial Network |
|
基于缺失的跨模态生成和分割 |
|
|
|
|
|
2017 |
|
Dual Attention Networks for Multimodal Reasoning and Matching |
|
|
2017 |
|
Discriminative Bimodal Networks for Visual Localization and Detection With Natural Language Queries |
|
|
2017 |
|
Missing Modalities Imputation via Cascaded Residual Autoencoder |
|
|
2017 |
|
Multi-Modal Mean-Fields via Cardinality-Based Clamping |
|
|
2017 |
|
Jointly Learning Energy Expenditures and Activities Using Egocentric Multimodal Signals |
|
|
2017 |
|
Instance-Aware Image and Sentence Matching With Selective Multimodal LSTM |
|
|
2017 |
|
AMC: Attention guided Multi-modal Correlation Learning for Image Search |
|
|
2017 |
|
Learning Cross-Modal Embeddings for Cooking Recipes and Food Images |
|
|
2017 |
|
Hierarchical Multimodal Metric Learning for Multimodal Classification |
|
|
2017 |
|
Deep Cross-Modal Hashing |
|
|
2017 |
|
Generalized Semantic Preserving Hashing for N-Label Cross-Modal Retrieval |
|
|
2017 |
|
Online Asymmetric Similarity Learning for Cross-Modal Retrieval |
|
|
2017 |
|
Human Shape From Silhouettes Using Generative HKS Descriptors and Cross-Modal Neural Networks |
|
|
2017 |
|
Are You Smarter Than a Sixth Grader? Textbook Question Answering for Multimodal Machine Comprehension |
|
|
2017 |
|
Multimodal Transfer: A Hierarchical Deep Convolutional Neural Network for Fast Artistic Style Transfer |
|
|
2017 |
|
Learning to Extract Semantic Structure From Documents Using Multimodal Fully Convolutional Neural Networks |
|
|
2017 |
|
Learning Cross-Modal Deep Representations for Robust Pedestrian Detection |
|
|
2017 |
|
Deep Multimodal Representation Learning From Temporal Data |
|
|
2017 |
|
GuessWhat?! Visual Object Discovery Through Multi-Modal Dialogue |
|
|
2017 |
|
Amodal Detection of 3D Objects: Inferring 3D Bounding Boxes From 2D Ones in RGB-Depth Images |
|
|
2017 |
|
Simultaneous Super-Resolution and Cross-Modality Synthesis of 3D Medical Images Using Weakly-Supervised Joint Convolutional Sparse Coding |
|
|
2017 |
|
Joint Sequence Learning and Cross-Modality Convolution for 3D Biomedical Segmentation |
|
|
2017 |
|
Fast Boosting Based Detection Using Scale Invariant Multimodal Multiresolution Filtered Features |
|
|
2017 |
|
Cross-Modality Binary Code Learning via Fusion Similarity Hashing |
|
|
|
|
|
|
|
2016 |
|
Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images |
|
|
2016 |
|
Collaborative Quantization for Cross-Modal Similarity Search |
|
|
2016 |
|
MDL-CW: A Multimodal Deep Learning Framework With Cross Weights |
|
|
2016 |
|
Cross Modal Distillation for Supervision Transfer |
|
|
2016 |
|
Learning Aligned Cross-Modal Representations From Weakly Aligned Data |
|
|
2016 |
|
Discriminative Multi-Modal Feature Fusion for RGBD Indoor Scene Recognition |
|
|
2016 |
|
Multimodal Spontaneous Emotion Corpus for Human Behavior Analysis |
|
|
2016 |
|
Temporal Multimodal Learning in Audiovisual Speech Recognition |
|
|
2016 |
|
Geospatial Correspondences for Multimodal Registration |
|
|
2016 |
|
Modality and Component Aware Feature Fusion For RGB-D Scene Classification |
|
|