The multi-pose MoveNet predicts human joint locations of people in the image and detects up to 6 people simultaneously. In contrast, the single-pose MoveNet predicts human joint locations of a single person. The single-pose MoveNet is lighter than the multi-pose MoveNet, it is more comfortable to work with single-pose MoveNet when the target is only one person.

TensorFlow > ONNX

You can convert a trained model of TensorFlow to an ONNX model by running the following in Google Colab.
!wget -q -O movenet_singlepose_lightning_4.tar.gz https://tfhub.dev/google/movenet/singlepose/lightning/4?tf-hub-format=compressed
!mkdir movenet_singlepose_lightning_4
!tar -zxvf movenet_singlepose_lightning_4.tar.gz -C movenet_singlepose_lightning_4/
​
!python -m tf2onnx.convert --saved-model movenet_singlepose_lightning_4 --output movenet_singlepose_lightning_4.onnx

ONNX > Wolfram Language

Import the two models and compare LayersCount and ByteCount.
In[]:=
SetDirectory[NotebookDirectory[]];
In[]:=
netMulti=Import["movenet_multipose_lightning_1.onnx","NetExternalObject"]​​(*netMulti=Import["https://www.wolframcloud.com/obj/okazaki.kotaro/Published/movenet_multipose_lightning_1.wl"]*)​​(**)
Out[]=
NetExternalObject

Information
​
Format:
ONNX
LayersCount:
443
ByteCount:
19113264
InputPort
input
:
array
(size: 1×
a
×
b
×3)
of integers
OutputPort
output_0
:
array
(size: 1×6×56)

In[]:=
netSingle=Import["movenet_singlepose_lightning_4.onnx","NetExternalObject"]​​(*netSingle=Import["https://www.wolframcloud.com/obj/okazaki.kotaro/Published/movenet_singlepose_lightning_4.wl"]*)
Out[]=
NetExternalObject

Information
​
Format:
ONNX
LayersCount:
229
ByteCount:
9466976
InputPort
input
:
array
(size: 1×192×192×3)
of integers
OutputPort
output_0
:
array
(size: 1×1×17×3)


Evaluation function

Define a helper function to evaluate neural network.
In[]:=
ClearAll[showSinglePose];​​encoderSingle=List@*IntegerPart@*ConstantTimesLayer["Scaling"255.]@*NetEncoder[{"Image",{192,192},InterleavingTrue}];​​showSinglePose[img_Image]:=Module[{size,res},​​size=ImageDimensions@img;​​res=netSingle[encoderSingle[img]];​​HighlightImage[img,getpose[1,{{Flatten@res}},size,False],ImageSizeMedium]​​];

Compare fps (frames per second)

In[]:=
img=
;
In[]:=
(*fpsofmovenet_multipose_lightning_1*)​​1/(RepeatedTiming[showMultiPose[img]][[1]])
Out[]=
21.5979
In[]:=
(*fpsofmovenet_singlepose_lightning_4*)​​1/(RepeatedTiming[showSinglePose[img]][[1]])
Out[]=
36.5492