Soap Bubble Video Stabilization

Examples

This example shows how to use corresponding keypoints to implement a simple video stabilization by estimating geometric transformations between pairs of consecutive frames from their corresponding points.

A version of this example which operates on a list of images contained in memory was originally created for the "New in Version 12" webpage; this is an updated version using file-based video functionality.  Start with a video that is fairly shaky.

Confine the search for keypoints to image regions that are known to be stable. In this particular case, use the bubble support in the lower half of the image by masking.

In[1]:=

$mask = \!$\* GraphicsBox[ TagBox[RasterBox[CompressedData[" 1:eJztybsJhFAUBNCrkW3YxaaGplqBgp9I4SmImaW7r4rd4JxgmGHqce/mMiKO 6hvdcDUpDXefR7ud0zKlz5jvtYh4cgEAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAA4KcKAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAOBvvDoqiJA= "], {{0, 360}, {640, 0}}, {0, 1}, ColorFunction->GrayLevel], BoxForm`ImageTag["Bit", ColorSpace -> Automatic, Interleaving -> None], Selectable->False], DefaultBaseStyle->"ImageGraphics", ImageSizeRaw->{640, 360}, PlotRange->{{0, 640}, {0, 360}}]$;$

To demonstrate the algorithm, extract the first few frames from the video. The first is shown with the mask.

In[2]:=

In[3]:=

Out[3]=

Image keypoints will need to be computed on every frame in order to find corresponding points between consecutive frames. Assuming a constant distance between camera and object, constrain the pair of point sets to be related by a rigid transformation. Define a function to compute keypoints on a pair of frames.

In[4]:=

$computeKeyPoints[{img1_, img2_}, mthd_] := ImageCorrespondingPoints[ img1, img2, MaxFeatures -> 25, Method -> mthd, Masking -> mask, TransformationClass -> "Rigid" ]$

Comparing the timings for different keypoint methods, "Oriented FAST and rotated BRIEF" (ORB) features are used, which are computationally less expensive than the others.

In[5]:=

BarChart[ AssociationMap[ 1000 First@RepeatedTiming@computeKeyPoints[frames[[;; 2]], #] &, {"BRISK", "KAZE", "AKAZE", "SURF", "ORB"} ], ChartLabels -> Automatic, ImageSize -> 300, AxesLabel -> "[ms/frame]", PlotLabel -> "Keypoint computation speed" ]

Out[5]=

Compute keypoints for the frames, confined to the the region of interest.

In[6]:=

In[7]:=

Out[8]=

Find the image transformation between frames, restricted to rigid transformations, from the keypoints.

In[9]:=

Accumulate the transformations from one frame to the next to obtain transformations in relation to the first frame.

In[10]:=

Out[10]=

Transform the frames to undo the shaky camera motion.

In[11]:=

stableFrames = MapThread[ ImagePerspectiveTransformation[#1, #2, DataRange -> Full, Padding -> "Fixed"] &, {frames, trafos}];

Show the difference between corresponding frames from the original video.

In[12]:=

Out[13]=

Describe this process in a function which outputs a stable frame given a pair of frames, by using and then updating the previous transformation.

In[14]:=

$trafo = TranslationTransform[{0, 0}]; computeStableFrame[{img1_, img2_}] := Block[{mutualTrafo, stable}, mutualTrafo = Apply[Last@ FindGeometricTransform[##, TransformationClass -> "Rigid"] &, computeKeyPoints[{img1, img2}, "ORB"]]; stable = ImagePerspectiveTransformation[img1, trafo, DataRange -> Full, Padding -> "Fixed"]; trafo = Composition[trafo, mutualTrafo]; stable ]$

Now that we have the algorithm, we will adapt it to file-based video functionality, thus avoiding the need to bring all frames into memory at once. Use VideoFrameMap with a sliding window of two frames. (Note that the last list is padded with copies of the last frame such that every frame in the video appears at position 1 in a list.)

In[15]:=

For a “before-and-after” effect, create a video which displays both the original and stabilized frames side-by-side. Since we already have the stable video, we will use that to avoid recomputing, but this could also have been done in a single step along with the original processing.

In[16]:=

In[17]:=

Out[17]=

Source Metadata

Publisher Information

Contributed By: Shadi Ashnai