
A magazine of scholarship and creative activity at Arizona State University
Go to:
Home Page
Printer-friendly Version
Engineering and Technology: Computer Science
Related ASU Research Stories
Out of the Darkness (feature)
Related ASU Web Sites
Center for Ubiquitous Computing
Publication Date: Summer 2003
People use words that represent shared concepts to describe what they see. They do this every time they describe an object, a face, or a landscape to a friend.
When describing a landscape, a person might use words such as hilly, rugged, verdant, cloudy, snowy, green, alpine, flower-covered, or grassy. Each word evokes visual concepts that create an image in the listeners mind.
Computers are different. When a computer processes that same landscape image, the description it produces will be something like 100001110001010, or 0001010100110.
John Blacks goal is to bridge this gap between humans and computers. He designs computers with built-in concepts that can be evoked by pictures. Black wants to help computers to describe pictures in human terms.
Blacks work is a key component of research being conducted as part of the iCare project at ASUs Center for Ubiquitous Computing (CUbiC). Professor Sethuraman Panchanathan directs the project. He says the goal is to develop new technologies and devices to aid the visually impaired.
Todays computers process data, but we are trying to design computers that process concepts, says Black, a doctoral student with the project. For example, when a modern computer describes a persons face, it does so in terms of the distance between the eyes, or the distance between the base of the nose and the chin. In contrast, humans describe faces in terms of concepts such as freckles, or big ears, or bushy eyebrows.
Black says there are three basic levels of image perception:
Low-levelPerceiving colors or regional boundaries
Mid-levelPerceiving surface patterns and textures
High-levelPerceiving complete three-dimensional objects
We are trying to endow computers with the ability to perceive images at all of these different levels. That is what humans do, Black says.
Humans rely on their lifetime of experience to interpret what they see. For example, when a person looks at an objecta horsehe or she sees a horse. A computer could get confused because of unusual lighting, or because it sees the horse from an unusual angle.
Black wants to provide computers with visual concepts that are evoked regardless of the variables. He wants to help computers to produce more robust descriptions of what they see.
One of Blacks tools is a photo wall of 94 diverse landscape scenes. He wants to identify how people group and describe these images.
Each test participant is given an image. Black then asks the person to select the most similar images from the wall display. To do so, they must decide what features in the images are the most important for judging similarity.
For example, if the original image includes snow, the participant might look for other images with snow. Participants describe each image by using a checklist of descriptive words. They are asked to mark words on the list that are most useful for describing the picture.
The list could turn out to be a key resource in the process. Black began with a list of 173 descriptive words. The words are basic concepts taken from a lexicon of the English language. Black asks participants to select the most useful words. He was able to shorten the original list to 98 visual concepts. After further study, Black has found that about 50 of the words represent visual concepts that are particularly salient. Now he faces a new challenge. He wants to develop software that allows a computer to examine an image and essentially fill out the same check sheet.
Black says that humans most often use words to describe images in terms of colors and textures. He hopes that higher level content can be deduced from characteristic combinations of these low and mid-level content words. A high level description (such as naming objects) is still in the future. Scientists have used the same word-based method to describe human faces, with encouraging results.
Its similar to teaching concepts to a child, Black explains. You start with the simple concepts, and gradually work up to more complex concepts. It takes time and patience.
Prem Kuchi and Kanav Kahol are also graduate student researchers with the iCare project. They look at different, but related areas. Kuchi and Kahol want to know how people perceive and describe the movements of other people when they are walking, dancing, or gesturing.Gary Campbell