2024

X. Feng, X. Li, S. Hu, D. Zhang, M. Wu, J. Zhang, X. Chen, and K. Huang.
MemVLT: Visual-Language Tracking with Adaptive Memory-based Prompts.
Conference on Neural Information Processing Systems (NeurIPS), 2024.
D. Zhang, S. Hu, X. Feng, X. Li, M. Wu, J. Zhang, K. Huang.
Beyond Accuracy: Tracking more like Human through Visual Search.
Conference on Neural Information Processing Systems (NeurIPS), 2024.
X. Li, X. Feng, S. Hu, M. Wu, D. Zhang, J. Zhang, and K. Huang.
DTLLM-VLT: Diverse Text Generation for Visual Language Tracking Based on LLM.
CVPR Workshop on Vision Datasets Understanding (CVPRW, Oral, Best Paper Honorable Mention Award), 2024.

2023

X. Zhao, S. Hu, Y. Wang, J. Zhang, Y. Hu, R. Liu, H. Ling, Y. Li, R. Li, K. Liu, and J. Li.
BioDrone: A Bionic Drone-based Single Object Tracking Benchmark for Robust Vision.
International Journal of Computer Vision (IJCV), 2023.
S. Hu, X. Zhao, and K. Huang
Visual Intelligence Evaluation Techniques for Single Object Tracking: A Survey (单目标跟踪中的视觉智能评估技术综述)
Journal of Images and Graphics (《中国图象图形学报》, CCF-B Chinese Journal), 2023
S. Hu, D. Zhang, M. Wu, X. Feng, X. Li, X. Zhao, and K. Huang.
A Multi-modal Global Instance Tracking Benchmark (MGIT): Better Locating Target in Complex Spatio-temporal and Causal Relationship.
Conference on Neural Information Processing Systems (NeurIPS), 2023.
S. Hu, X. Zhao, and K. Huang.
SOTVerse: A User-defined Task Space of Single Object Tracking.
International Journal of Computer Vision (IJCV), 2023.

2022

S. Hu, X. Zhao, L. Huang, and K. Huang.
Global Instance Tracking: Locating Target more like Humans.
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022.

2019

L. Huang, X. Zhao, and K. Huang.
GOT-10k: A Large High-diversity Benchmark for Generic Object Tracking in the Wild.
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2019.
X. Feng, S. Hu, X. Li, D. Zhang, M. Wu, J. Zhang, X. Chen, and K. Huang.
Robust Vision-Language Tracking through Multimodal Target-Context Cues Aligned with Target States.
Submitted to a CCF-A conference, 2024.
X. Li, S. Hu, X. Feng, D. Zhang, M. Wu, J. Zhang, K. Huang.
DTVLT: A Multi-modal Diverse Text Benchmark for Visual Language Tracking Based on LLM.
Submitted to a CAAI-A conference, 2024.
S. Hu*, X. Li*, X. Li, J. Zhang, Y. Wang, X. Zhao, K. Cheong (*Equal Contributions)
Can LVLMs Describe Videos like Humans? A Five-in-One Video Annotations Benchmark for Better Human-Machine Comparison.
Submitted to a CAAI-A conference, 2024.
X. Feng, D. Zhang, S. Hu, X. Li, M. Wu, J. Zhang, X. Chen, and K. Huang.
Enhancing Vision-Language Tracking by Effectively Converting Textual Cues into Visual Cues.
Submitted to a CCF-B conference, 2024.
X. Li, S. Hu, X. Feng, D. Zhang, M. Wu, J. Zhang, K. Huang.
Visual Language Tracking with Multi-modal Interaction: A Robust Benchmark.
Submitted to a CCF-A conference workshop, 2024.