The multi-pose MoveNet predicts human joint locations of people in the image and detects up to 6 people simultaneously. In contrast, the single-pose MoveNet predicts human joint locations of a single person. The single-pose MoveNet is lighter than the multi-pose MoveNet, it is more comfortable to work with single-pose MoveNet when the target is only one person.

TensorFlow > ONNX

You can convert a trained model of TensorFlow to an ONNX model by running the following in Google Colab.

!wget -q -O movenet_singlepose_lightning_4.tar.gz https://tfhub.dev/google/movenet/singlepose/lightning/4?tf-hub-format=compressed

!mkdir movenet_singlepose_lightning_4

!tar -zxvf movenet_singlepose_lightning_4.tar.gz -C movenet_singlepose_lightning_4/

!python -m tf2onnx.convert --saved-model movenet_singlepose_lightning_4 --output movenet_singlepose_lightning_4.onnx

ONNX > Wolfram Language

Import the two models and compare LayersCount and ByteCount.

In[]:=

SetDirectory[NotebookDirectory[]];

In[]:=

netMulti=Import["movenet_multipose_lightning_1.onnx","NetExternalObject"](*netMulti=Import["https://www.wolframcloud.com/obj/okazaki.kotaro/Published/movenet_multipose_lightning_1.wl"]*)(**)

Out[]=

NetExternalObject



Information
Format:	ONNX
LayersCount:	443
ByteCount:	19113264
InputPort
input :	array (size: 1× a × b ×3) of integers
OutputPort
output_0 :	array (size: 1×6×56)



In[]:=

netSingle=Import["movenet_singlepose_lightning_4.onnx","NetExternalObject"](*netSingle=Import["https://www.wolframcloud.com/obj/okazaki.kotaro/Published/movenet_singlepose_lightning_4.wl"]*)

Out[]=

NetExternalObject



Information
Format:	ONNX
LayersCount:	229
ByteCount:	9466976
InputPort
input :	array (size: 1×192×192×3) of integers
OutputPort
output_0 :	array (size: 1×1×17×3)



Evaluation function

Define a helper function to evaluate neural network.

In[]:=

ClearAll[showSinglePose];encoderSingle=List@*IntegerPart@*ConstantTimesLayer["Scaling"255.]@*NetEncoder[{"Image",{192,192},InterleavingTrue}];showSinglePose[img_Image]:=Module[{size,res},size=ImageDimensions@img;res=netSingle[encoderSingle[img]];HighlightImage[img,getpose[1,{{Flatten@res}},size,False],ImageSizeMedium]];

Compare fps (frames per second)

In[]:=

img=

;

In[]:=

(*fpsofmovenet_multipose_lightning_1*)1/(RepeatedTiming[showMultiPose[img]][[1]])

Out[]=

21.5979

In[]:=

(*fpsofmovenet_singlepose_lightning_4*)1/(RepeatedTiming[showSinglePose[img]][[1]])

Out[]=

36.5492