background

The following pages describe a prototype stereo display system which attempts to provide near-correct focus cues without tracking eye position. We present data and images from Akeley, K., Watt, S. J., Girshick, A. R., & Banks, M. S. (2004) A stereo display prototype with multiple focal distances. ACM Transactions on Graphics, 23, I804-813. [PDF]

Eye movements and the focusing of the eyes normally work together. When this coupling is broken by typical stereo graphics displays many consequences occur (including discomfort, induced binocular stress, difficulty in fusing the two images into a stereo pair, and error in the perception of scene geometry).

This prototype attempts to alleviate these problems by providing multiple focal distances. The additive nature of light along a visual line allows transparency and reflection to be rendered accurately and depicted with near-correct focus cues. There is a need for multiple focal distances both along a visual line and in different visual directions. The left panel in the figure below illustrates an example of the latter; because the cube's surface is flat, the correct focal distance of the reflection of the cylinder is the sum of the distances from the eye to the cube and from the cube to the cylinder. The right planel illustrates how the scene is rendered into a fixed-viewpoint volumetric display. The reflection is drawn deeper into the display, at the focal distance that is the sum of the eye-to cube and cube-to-cylinder distances.

Multiple focal distances along a visual line.


Scene Geometry: the reflection of the cylinder has a longer focal distance than the surface of the cube.
Volumetric Illumination: illustrates how the scene is rendered to a volumetric display with high depth resolution.

apparatus

The prototype design emphasizes depth resolution by providing several additive image planes at substantially different focal distances. As you can see from the images below, an LCD flat panel is viewed through plate beamsplitters such that each eye sees three superimposed images. No optical elements other than beamsplitters and mirrors are used, so the focal distances of the three image planes are equal to their measured distances.

Left: Four views of the prototype display. The T221 monitor is removed in the bottom two images to expose the beamsplitters and front-surface mirrors.
Right: Each eye views three superimposed viewports. Periscope optics separate the visual axes so that the left and right viewports do not overlap. (The side view is rotated 90 deg counterclockwise).

Full screen image of the T221 monitor. The scene was rendered using RenderMan ©, producing separate 4,500 x 1,500 left-eye and right-eye views with both color and depth information. These image files were read by the prototype software, then remapped and depth filtered to generate the six viewport images. The software is able to read and display short sequences of such ray traced image files, providing a movie loop capability to view highly detailed scenes. The viewports are outlined in white for clarity - these outlines are suppressed in actual use. (Plant models courtesy of Xfrog Public Plants).

results

To evaluate our prototype's effectiveness we wanted to know: Is user performance improved? More specifically, when fixation distance and focal distance are matched, is the amount of time required to fuse (perceive the depth of) a stereo scene decreased? We expected that fusion would be faster when fixation and focal distances were nearly matched.

Subjects completed a series of trials in which a target was briefly displayed, bringing subject fixation and accommodation to the center of the near image plane. Then the object to be fused was shown for a specific number of frames.The experiment used a two-alternative, forced-choice procedure. The object to be fused was a psuedo-random pattern of dots rendered on two frontoparallel, closely spaced planes. Dots were red on one plane and green on the other. The subject's task was to indicate whether the plane of red dots was nearer or farther than the plane of green dots. The task is easy once the dots have been fused, and is impossible otherwise. Subjects completed trials for 12 separate staircases, each with a different combination of dot fixation and focal distance. The first trial of each staircase displayed the dots for 40 frames. Incorrect responses increased the time allowed to fuse the stimulus, while correct responses decreased it. After 12 reversals, the last 4 reversals were averaged to give the estimate of the stimulus duration needed to get 71% correct.

Results collapsed across all subjects are shown below. (All subjects show essentially the same pattern). In the cues-consistent case, focal distance was equal to fixation distance (except for the mid far condition in which the image energy is divided equally between the mid and far image planes, rather than presented on a mid-far image plane). In the cues-inconsistent case, focal distance was at the near image plane, while fixation distance was at either the near, the far, or the dioptric midpoint between the mid and far image planes.

So, our expectation turned out to be correct: User performance is improved when fixation distance and focal distance are matched. In fact, at the far fixation distance, the cues-inconsistent case required on average 70% more time to fuse.

This image represents the average viewing time required for subjects to fuse an object displayed at mid, far, or mid-far (located at the dioptric midpoint between the mid and far image planes) fixation distances, with focal distance held to the near distance (black bars) or consistent with the fixation distance (gray bars).