TimeWarp -- A User-Adaptive Video Summarization System
We propose TimeWarp (Fig. 1.), a user adaptive video summarization system which takes into account individual preferences by analyzing each user's personal photo library. Nowadays it is common practice, especially among people of younger generations, to store thousands of photos inside their PCs, taken with digital cameras. These personal photo libraries contain rich information about the users' tastes, personalities, and lifestyles, and due to the similarities in characteristics that exist between video and still photos, we believe it is possible to infer each user's preferences on video summarization from these libraries.

Fig. 2. illustrates the procedure with which TimeWarp summarizes video. The user's photo library is first sent through the photo library categorization process, where photos in the library are grouped into several major categories by their contents. In the video segmentation process, the input movie (the movie to be summarized) is divided into shots, which are further divided into short segments. Next, in the image classification process, key frames extracted from the segments are analyzed and judged if their contents fall into any of the categories found in the user's library. "Important" sections of the movie are determined through these judgments. Finally, in the summarization process, the "unimportant" sections are deleted, and the remaining sections are reconstructed into a coherent summary of the original movie.

In the photo library categorization phase, the user categorizes his/her photos by sliding them inside a window (Fig. 3.), so that photos with similar contents are located close to each other. Since a personal photo library can contain thousands, or tens of thousands of photos, it is unreasonable to require the user to categorize every single photo in the library. Instead, the system randomly picks out photos from the user's library, and asks the user to categorize this subgroup of the library, consisting of around several hundred photos at most.

In the video segmentation phase, the original movie is divided into shots, by looking for sudden changes in color histograms. Each shot is further divided into segments by splitting the shot using a fixed time interval (one second in current implementation). The first frame of each segment is picked out as the key frame of the segment (Fig. 4).

In the image classification phase, by using image classification techniques, the key frame of each segment is judged whether it belongs to any of the categories that had been defined in userfs photo library. After classification has been performed for a key frame, an importance value is assigned to the segment based on the classification results. If the segment has been judged as belonging to a prominent (i.e. consisting of a large number of photos) category, it is given a high importance value. This process is based on our assumption that people tend to take more pictures of objects or scenes which are important to them, or to which they are visually attracted.

Basically, summarization is performed by removing segments judged as "unimportant". However, two issues need to be considered here. One is that simply removing "unimportant" segments would easily lead to fracturing the video too much, such that a person who watched the summarized movie would be unable to understand the original plot or intention of the movie. Two is that in home videos in general, each shot tends to have a certain level of significance, and thus every shot should be preserved to some degree in the final movie and no shot should be completely removed. TimeWarp incorporates a sophisticated summarization procedure that effectively deals with the issues described above.

We have implemented TimeWarp as a Mac OS X Cocoa application, and conducted a series of evaluation tests to assess its effectiveness. The results have validated our basic assumption that incorporating userfs preferences on photos can lead to more effective video summarization. However, several issues, such as the low accuracy of image classification, need to be resolved in our future work.
Fig 1. TimeWarp: automatic video summarization
Fig 2. Video summarization procedure
Fig 3. Photo library categorization interface
Fig 4. Shots and segments