Apple has a way to turn all your Photos into spatial environments – in real time

Apple has invented a tech which means you can swipe through your entire photo collection and see every image you look at rendered in 3D in real-time – and claims to set a new benchmark for the task.
Apple’s researchers amaze again
SHARP (Single-image High-Accuracy Real-time Parallax) is a new open-source model which can turn 2D photos into 3D Gaussian images. What that means is that the model can look at a still image, figure out the depth of field, and then create a 3D scene based on that image.
“Our approach generates a high-resolution 3D representation that provides such experiences from single-image input in less than a second on a single GPU, supporting conversion of pre-existing photographs to photorealistic 3D during interactive browsing of a photo collection,” the research team said.
That means that you won’t be able to take an image and turn it into a fully explorable environment yet. Apple’s SHARP won’t yet generate renders that go beyond the border of the original image, but it’s not too hard to imagine weaving multiple images of the same location to create such environments.
What does SHARP do?
What it does accomplish seems interesting enough. Appel’s researchers claim:
- SHARP can create photorealistic views from a single image.
- Images can be rendered in real time, even on standard processors.
- Representations are metric, “with absolute scale”.
They also point out to significant performance improvements in contrast to similar experiments, saying SHARP, “sets a new state of the art on multiple datasets, reducing LPIPS by 25-34% and DISTS by 21-43% versus the best prior model, while lowering the synthesis time by three orders of magnitude.”
It is important to think about what’s also going on here.
You see, while other tools of this nature require hundreds of images of the same scene to build usable 3D renders, SHARP can accomplish this from a single image.
What this means?
That means you can get that immersive feeling each time you stick a set of visionOS devices on and swipe through your image collection, including that time you visited the Great Pyramid just outside Cairo, or when you want to refresh your memory of a location while engaged in architectural design.
The tech is designed to, “Give you the experience of looking at a stable 3D scene from different perspectives,” but need not “support substantial travel (“walking around”) within the photograph.”
There are quite clearly significant real world uses as well as fun sit back experiences to be found in this tech. That Apple can do this based on a single image is pretty remarkable – presumably more images of the same location from different angles delivers even more accurate results? The paper doesn’t say.
Where we are going
Think back to before Apple even introduced Vision Pro and you may recall speculation the company was developing tech to enable the creation of photo realistic 3D environments that changed in real time. SHARP feels like it may reflect some of that work, and bodes well for such an implementation in future. If you think about it, using sequences of images should make for more realistic virtual 3D environments, created by and navigated in real time on the kind of processors we already use each day.
This could form first sight of an enabling technology for all kinds of enterprise, education, even government use, from road traffic prediction to 3D modelling and testing to military field control. The ability to produce accurate renders of real scenes in virtual space will be a great enabler for ‘digital twin’ solutions. visionOS 2 already converts 2D images into immersive 3D Spatial Photos, so this is the next step.
But what the model tells us about how Apple is thinking – powerful LLMs that produce outstanding results for low power on ordinary processors – should make you look forward both to the next major Vision Pro update and also to the world of AI at the edge the company clearly wants to build.
You can follow me on social media! Join me on BlueSky, LinkedIn, and Mastodon.