
A team of artificial intelligence researchers from KAIST AI in South Korea has created an innovative method they call the Chain-of-Zoom framework. This technique allows users to generate incredibly detailed images, or super-resolution images, using existing models without needing to retrain them.
The researchers—Bryan Sangwoo Kim, Jeongsol Kim, and Jong Chul Ye—shared their findings in a study published on the arXiv preprint server. They explored how to enhance an image by zooming in on it step by step, improving the resolution incrementally at each stage with the help of existing super-resolution models.
The team initially pointed out that traditional methods for enhancing photo resolution typically rely on interpolation or regression techniques, which often lead to unclear, blurry images. To tackle this, they introduced a fresh approach: a sequential zooming process, where each step builds on the last.
This new methodology is known as Chain-of-Zoom (CoZ), reflecting the series of steps involved in enhancing image quality.
At each step, the framework utilizes a pre-existing super-resolution model to kickstart the enhancement process. In tandem, a vision-language model (VLM) crafts descriptive prompts that guide the super-resolution model in refining the image. This collaboration results in a highly detailed zoomed-in version of the original image.

The framework continues this cycle, using helpful insights from the VLM to enhance the zoomed image further until it reaches a final version. To ensure the prompts from the VLM were effective, the team implemented reinforcement learning strategies. Tests showed that this framework outperformed images generated using traditional methods.
The researchers emphasize that their technique does not require retraining of models to improve image quality, making it more flexible. However, they also caution users to be mindful of how they utilize the framework. The zoomed images are not real; they are generated through the algorithm.
For instance, if someone tried to zoom in on the letters or numbers of a license plate from a getaway car, the resultant image might show clear characters, but those might not correspond to the actual license plate on the vehicle.
More information:
Bryan Sangwoo Kim et al, Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment, arXiv (2025). DOI: 10.48550/arxiv.2505.18600
Project page: bryanswkim.github.io/chain-of-zoom/
If you would like to see similar Tech posts like this, click here & share this article with your friends!