A successful autonomous system needs to not only understand the visual world but also communicate its understanding with humans. To make this possible, language can serve as a natural link between high level semantic concepts and low level visual perception. In this talk, I'll present our recent work in the interdisciplinary domain of vision and language. I’ll show how we can exploit the alignment between movies and books in order to build more descriptive captioning systems. I’ll also discuss our efforts towards automatic understanding of stories from long and complex videos.Sanja Fidler is an Assistant Professor at the Department of Computer Science, University of Toronto. Previously she was a Research Assistant Professor at TTI-Chicago, a philanthropically endowed academic institute located in the campus of the University of Chicago. She completed her PhD in computer science at the University of Ljubljana in 2010, and was a postdoctoral fellow at University of Toronto during 2011-2012. She has served in program committees of numerous international conferences, and has received three outstanding reviewer awards. Together with Rich Zemel and Raquel Urtasun, she received the NVIDIA Pioneer of AI award. Her main research interests are object detection, 3D scene understanding, and the intersection of language and vision.