Final Report

Generating Cinemagraphs

Tushar Turkar

Outline

Summary

Cinemagraphs are seamlessly looping media files wherein one or more prominent (or selected) region in the foreground is dynamically in motion while rest of the scene is static.

The Problem :

A regular process to generate a cinemagraph comprises of the following steps:

Recording - Capture stabilized footage, e.g., with a tripod
Editing Software - Import the raw footage and clip it to show the desired content
Masking - Create a mask to mobilize / immobilize the areas of interest
Composite - Apply the mask and composite it with a static (invert-masked) image from the clip.
Looping - Add seamless boundaries around clip on which to loop.

While all this can also be done with several algorithms proposed in the research papers mentioned ahead, creating a cinemagraph still remains a cumbersome task for a majority of users unfamiliar with such techniques.

Significance :

Cinemagraphs are the latest evolution of animated media

This new media has been slowly gaining grounds over its primitive counterparts. The subtle motion in a cinemagraph adds an appealing look to it making it well favored, trendy and thus, marketable on social media. Although, as discussed earlier, many non-tech savvy users find creation of a cinemagraph as too complicated. Providing an easy-to-use method to generate one would enable users to create aesthetically pleasing pieces of art.

This project aims at automatically synthesizing cinemagraph from a given sequence of frames without having to go through the hassle of dealing with laborious video editing software or implementing complicated algorithms.

Previous Work

This is a well researched field in Computer Graphics / Vision in that there are several methodologies and algorithms available to assist creating a cinemagraph in its various stages - stable video capture, video stabilization, selecting regions of mobility, object tracking, motion analysis, masking, blending, composition, seamless loops, and the list goes on.

Following are the research papers relevant to the aforementioned stages in creation of cinemagraph:

An Approach to Automatic Creation of Cinemagraphs - Searches for a sub-volume in the video with maximum cumulative flow field.

Selecting Interesting Image Regions to Automatically Create Cinemagraphs - Focuses on determining compositions of masks and layers for dynamic region in a cinemagraph.

Towards Moment Imagery: Automatic Cinemagraphs - Provides an authoring tool for segmentation and interactive motion selection.

Turning an Urban Scene Video into a Cinemagraph - Detects static geometry and dynamic appearance regions using temporal analysis

Automated Video Looping with Progressive Dynamism - Determines independent looping regions and provides interactive local adjustment over dynamism of the same.

For the scope of this project :

The reference primarily used is Selectively De-Animating Video[1]. In this paper, the user draws strokes determining the static and dynamic regions of the video and the proposed algorithm removes the motion of all the regions accept the selected one. This paper uses methods from Content-Preserving Warps for 3D Video Stabilization for video stabilization and Graphcut Textures for seamless looping.

Another paper Automatic Cinemagraph Portraits, heavily influenced by [1], captures the nuances of dynamic facial expressions for portraits. It uses a combination of face tracking and point tracking to segment face motions to remove large motions from the video, and preserve small facial expressions.

The fundamental process to create a cinemagraph, consisting of warping and composition, and techniques to implement them are inspired from this paper[1] for this project.
OpenCV-Python and MoviePy have been used for video processing operations in this project. [The process is detailed ahead]

Description of Work

The tasks mentioned in the update report were completed successfully but were done so independently, i.e, both the tasks have done but not been merged into a single pipeline to generate a cinemagraph. The first part of the tasks - warping - is completed as a proof of concept but I was not able to plug it in the code for compositing in the given time frame. The work carried out is described further.

The first part of the task was warping the video frames to mobilize / immobilize a user selected set of features in a video. This was achieved using the following steps:

Find all feature points in the reference frame using Shi-Tomasi Corner Detector
Acquire the region of interest from the user
Iteratively track the features in the given region of interest using Lucas-Kanade Sparse Optical Flow method
For a robust tracking, detect in particular intervals (5 frames used in this project). In this For selecting only good feature points that persist through a particular time duration, the project also runs a backward-check of the optical flow points and selects feature points within a threshold.
Find Homography between the reference and current set of features.
Apply perspective transform on the reference frame to match according to the homography.

Selection of feature points to track and stabilize

The second part, composition is fairly straight forward as far as the scope of this project is concerned. For quickly creating a cinemagraph, the program lets the user select the region desired to be animated. The program then creates a mask based on the selection. The mask is blurred to provide a seamless blend between the regions of animation / de-animation.

Selection of ROI

Mask

Challenges & Dead Ends:

I started out to implement spatially-varying warp defined on a rectilinear mesh placed on each video frame similar to the approach of Liu et al. [2009] which is also referenced in the paper [1]. However, in case of the former, the main warping constraint is given by the output of a 3D reconstruction, while in later case (which is the primary scope of the project), the goal is to warp each output frame so that the anchors align with the reference frame.

To circumvent the issue, [1] suggests creating a 64x32 rectilinear grid mesh on each input frame and find the output mesh using two pass warping for refinement. I was unable to replicate good results from this approach and hence moved to homographic transformation. Having spent much time on the technique suggested [1], it was a challenge to finally be able to warp input sequence frames according to reference.

Not a dead end, rather an open end in the project is implementing warping in the MoviePy module for compositing complete cinemagraphs from scratch using feature tracking.

Results

The two main tasks were achieved taking the following assumptions into account:

The pixel intensities of an object do not change between consecutive frames.

Neighboring pixels have similar motion.

The video is stabilized and does not contain pan motion.

Given below is the warping transformation achieved using homography. The region can be given apriori or can be defined by user input.

The selected feature points in the above GIF remain "stationary" with reference to the background. This is the basic conceptualization for stabilization. If we simply display the region the interest masking the background and compositing with a static frame, the features selected will be fixed in with respect to the reference frame.

Note: The background will get distorted due to applied homographic transformation and is to be handled separately.

A picture says a thousand words; A GIF, more...

The following cinemagraphs are generated using the project.

[YouTube]

Analysis

New Results:

This project did not propose cinemagraph generation technique on its own. This was an attempt at scraping off of concepts and implementations in parts from researches conducted and tools developed hitherto as described in the project proposal.

This project finds its novelty in what things are put together and how exactly. Through this, I propose building an image processing module specifically for detection, tracking, mobilizing, de-animating, etc of ROI in MoviePy so that the functionalities can be extended for creating a better range of cinemagraphs.

More is discussed in future work.

Goals:

Majority of the goals that were set in the update report were successfully met. Warping and Composition, the primary steps were carried out, albeit individually.

The targets that were not reached include:

Using floating feature tracks for warping
Warping and composition in a single pipeline for end to end cinemagraph generation.
Background distortion removal (this is a sub-task of the above statement; this is added here since this was mentioned in the update report)

The shortcomings of this project can be attributed to unforeseen complications in implementation of the described techniques which later led to modifications in the process.

Future Work:

Now this is interesting!

Through this project what I aimed to do is give control to the user as to what part of the video clip is desired to have dynamic motion. The main resources used for creating this program are OpenCV-Python and MoviePy. A possible future work for this project can be integration of the OpenCV warping module into MoviePy so that users can create more interesting cinemagraphs; having at their disposal, powerful techniques mentioned in [1] while also harnessing the video processing abilities of MoviePy.

It gets better!!

Another possible line of work could be, using this spatially varying warp technique to randomly induce motion into still frames thus giving a static image a feel of dynamism. This is also a well know media popular as Plotagraphs.

This project dealt with creating animations from video clips. Plotagraphs are animated using a single image. The process is similar - selection, masking, adding smooth interpolated motion, seamless looping. The underlined part is where this project's warp might come in. Currently, key-frame animators in editing software are used for the said part. Instead of stabilizing the features during the warping stage, we can modify the feature locations using a defined transformation to get the desired animation.

Thanks for Reading

Kbeznak Parmatonic

Computer Graphics - 641

Search This Blog