Augmented Reality Books 101

Several people have asked me how they can learn about Augmented Reality and Augmented Reality books. Below is a summary of what is involved, how computer vision works for the layperson, a commentary on the types of books that will be augmented, and a list of related resources.

Augmented reality overlays video onto camera-captured video in such a way that the computer-generated objects appear to have an absolute location in the real world or relative to a real world object like a page in a book. In the case of a book, the augmented reality app uses the mobile device camera to display an overlay video superimposed over the corresponding illustration in the book.

Adding augmented reality to a book involves the following components:

  • A book with illustrations
  • Marker images that reflect the “active” illustrations in the book which are augmented
  • Video or 3D model overlays that are superimposed over the footage coming from the mobile device’s camera.
  • A Content Management System that matches marker images with video overlays, and is used to place the overlays with respect to the markers.
  • An Augmented reality enabled app that is loaded with the marker images and overlays. It also contains any logic or rules-based structures which define potential interactions with the overlays.
  • A mobile device with a camera through which the augmented reality app can search footage for marker images. Ideally, the device has a light that can be used as a flashlight to illuminate the marker images in dim light.
  • Image recognition middleware which interprets the footage from the mobile device camera, determines when marker images are being triggered and how to integrate virtual objects into the display stream.
  • The display streams the merged footage of the camera and the overlays back to the user. A cell phone appears like a “porthole” or “magic mirror” into the augmented world of the book, while a wearable display can provide total immersion.

To operate the augmented reality book, the user starts the app and points the camera view at illustrations in the book. The app attempts to match the images shown in the camera view with the database of markers. When a match is found, the app retrieves the overlay corresponding to the marker that was found and the overlay’s placement with respect to the marker from the database. The app sends the overlay to the display, which is placed as specified over the illustration in the book with the marker, and streamed to the user along with the footage from the camera.

How Image Recognition Works

The most complex of the components is image recognition or computer vision. The augmented reality book uses instance-level recognition to identify specific visual markers in the streamed real world footage from the camera. (Category-level recognition has the much more difficult task of classifying image content into known categories such as pedestrians, faces, dogs, plants, etc.) In this type of image recognition, the system stores image descriptors in a search indexing data structure and operates in three stages:

  1. Key point detection: The bottleneck in instance-level recognition systems – involves mathematical operations to detect corners and blobs – regions where discontinuities occur and thus can carry a lot, if not all, of the image information. Corners are regions in an image that are sharply different from their immediate neighboring regions. Blobs are regions darker or brighter than it’s surrounding region. In instance-level recognition, only these regions are further processed. Dominant orientation direction can also be extracted to make the recognition process rotation-invariant. The scale-space approach makes the process scale-invariant. Using greyscale markers and adding a greyscale filter on the camera footage prior to key point detection makes the process hue-invariant.
  2. Descriptor extraction: Descriptors are small patches around the key point, that are extracted in way that introduce tolerance to distortions and illumination issues of the scene. The number of such points/descriptors is typically between 100 and 2,000 per image depending on its complexity, thus it is important to find matching descriptors quickly between a database and the observed scene.
  3. Matching and verification: Once matches are found, those that are not strictly consistent with a particular model of the scene/object are filtered out, based on the probability of object presence. Thus this stage determines whether the marker is actually present in the illustration or not.
Many Types of Books will Be Augmented
Any book can be augmented and all can benefit from augmentation. Books with illustrations can have both image and text markers. Image recognition is used for image markers and optical character recognition is used for text markers. I anticipate that many types of books will be augmented with image markers and video or 3D model overlays such as textbooks, non-fiction books, recipe books, coffee table books, children’s books, graphic novels, comic books, and magazines. Businesses will augment their catalogs as well. Books without illustrations have text markers.  I anticipate that these books will have the ability to make virtual highlights, make notes and comments, and have conversations. They may also have audio and/or interactive visual overlays.
The early adopters are likely to be children’s books, recipe books, non-fiction, and catalogs.
Other Resources on Augmented Reality and Augmented Reality Books

 

Here are some fun and informative pages.