Language-EXtended Indoor SLAM (LEXIS): A Versatile System for Real-time Scene Understanding

Metadata

Title: Language-EXtended Indoor SLAM (LEXIS): A Versatile System for Real-time Scene Understanding
Category: undergraduate
UUID: 56085024631e418394b575255266e237

Source URL: https://dynamic.robots.ox.ac.uk/projects/lexis/
Parent URL: https://dynamic.robots.ox.ac.uk/projects/
Crawl Time: 2026-03-09T03:20:54+00:00

# Language-EXtended Indoor SLAM (LEXIS): A Versatile System for Real-time Scene Understanding

**Source**: https://dynamic.robots.ox.ac.uk/projects/lexis/
**Parent**: https://dynamic.robots.ox.ac.uk/projects/

[Christina Kassab](https://ckassab.github.io/), [Matias Mattamala](https://mmattamala.github.io/), [Lintong Zhang](https://ori.ox.ac.uk/people/lintong-zhang/), [Maurice Fallon](https://ori.ox.ac.uk/people/maurice-fallon/)

IEEE International Conference on Robotics and Automation (ICRA) 2024

**Abstract**
Versatile and adaptive semantic understanding would enable autonomous systems to comprehend and interact with their surroundings. Existing fixed-class models limit the adaptability of indoor mobile and assistive autonomous systems. In this work, we introduce LEXIS, a real-time indoor Simultaneous Localization and Mapping (SLAM) system that harnesses the open-vocabulary nature of Large Language Models (LLMs) to create a unified approach to scene understanding and place recognition. The approach first builds a topological SLAM graph of the environment (using visual-inertial odometry) and embeds Contrastive Language-Image Pretraining (CLIP) features in the graph nodes. We use this representation for flexible room classification and segmentation, serving as a basis for room-centric place recognition. This allows loop closure searches to be directed towards semantically relevant places. Our proposed system is evaluated using both public, simulated data and real-world data, covering office and home environments. It successfully categorizes rooms with varying layouts and dimensions and outperforms the state-of-the-art (SOTA). For place recognition and trajectory estimation tasks we achieve equivalent performance to the SOTA, all also utilizing the same pre-trained model. Lastly, we demonstrate the system’s potential for planning.

**Citation**

```
@article{kassab2024lexis,
  author = {Kassab, Christina and Mattamala, Matias and Zhang, Lintong and Fallon, Maurice},
  title = {Language-EXtended Indoor SLAM (LEXIS): A Versatile System for Real-time Visual Scene Understanding},
  publisher = {IEEE Int. Conf. Robot. Autom.},
  year = {2024},
}
```

**Acknowledgement**
We thank Nathan Hughes for providing the ground truth for the uHumans2 dataset.