14/07/2023

The 20th century turned out to be an era of exponential growth in the field of machine learning. The 3000-year-old ancient game of ‘Go’ that computer scientists predicted will …

Coding time
We will use a webcam as the video input to our pose estimation model and show the output on our main page index.html.
We are using two libraries here:

  • ml5.js for creating and running our ML model.
  • p5.js for getting the webcam video feed and displaying output in our browser.

Ive added extensive documentation inside the code, explaining every single line. Here, we will discuss the main crux which is the majority of the code anyway.
Our code consists of two files:

  • poseNet_webcam.js, our JavaScript code.
  • index.html, the main page to show output.

PoseNet_webcam.js
p5.js runs two functions:

  • function setup(). The first function that is executed and runs only once. We will do our initial setup in it.
  • function draw(). This function is called on repeat forever (unless you plan on closing the browser or pressing the power button).

createCanvas(width, height) is provided by p5 to create a box in the browser to show our output. Here, canvas has width: 640px and height: 480px.
createCapture(VIDEO) is used to capture a webcam feed and return a p5 element object, which we will name webcam_output. We set the webcam video to the same height and width of our canvas.
ml5.poseNet() creates a new PoseNet model, taking as input:

  • Our present webcam output.
  • A callback function, which is called when the model is successfully loaded. Inside our index.html file, we have created an HTML paragraph with an ID status showing the current status text to the user. We change that text to Model Loaded for the user to know, as the model takes a bit to load.

poseNet.on() is a trigger or event listener. Whenever the webcam gives a new image, it is given to the PoseNet model.
The moment pose is detected and output is ready. It calls function(results), where results is the final output of keypoints and scores given by the model.
We store this in our poses array, which is globally defined and can be used anywhere in our code. webcam_output.hide() hides the webcam output for now, as we will modify the images and show the image with detected points and lines later.
All we have left to do is to show the image with all the detection results stored in poses in the browser.
As we know, the draw() function runs in a loop forever. Inside this, we call the image() function to display our image (as we have our video image-by-image) in the canvas.
It takes five arguments:

  • input image. The image we want to display.
  • x position. The x-coordinate of the top-left corner of the image in respect to the canvas.
  • y position. The y-coordinate of the top-left corner of the image in respect to the canvas.
  • width. The width to draw the image.
  • height. The height to draw the image.

We then call drawKeyPoints() and drawSkeleton() to draw the dots and lines on the current image. draw() does this in an infinite loop, hence showing a continuous output to the user, which makes it look like a video.
As you can see above, PoseNet returns a JavaScript object as output, consisting of many key-value pairs. This is the pose key-value out of the pose and skeleton values, provided for each person in an image.
We have a function to draw detected points on the image. Remember, we saved all the results from the PoseNet output in the poses array. Here, we loop through every pose or person in an image and get its keypoints.
We loop through every point that is a body part in the keypoints array, which further has:

  • part. The name of the part detected.
  • position. x and y values of a point in the image.
  • score. Accuracy of detection.

We only draw a point if the accuracy of detection is greater than 0.2. We call fill(red, green, blue), taking RGB intensity value ranging from 0 to 255 to decide the color of a point, and noStroke() to disable drawing the outline that p5 draws by default.
Then, we call ellipse(x_value, y_value, width, height) to draw an ellipse at the desired position but we keep the width and height very small, which makes them look like a dot (exactly what we wanted).</p>
Similarly, as our variable poses has multiple poses in it, it also has multiple skeleton values with their own type of key-value pairs, which is handled by drawSkeleton() drawing lines instead of points.
index.html
This is the main page where we display our output. We add all our libraries using script tags.
We show a cute welcome intro to the user. As model loading takes time, we show the Loading model message. If you remember, we change it to Model Loaded once our model is loaded using the reference on an ID, called status.
At last, we put our own JS code inside the body. Run the index.html file to see the output. Make sure you allow webcam access when prompted.
Thats it! You can always go to the ml5.js reference page, which has many more ready-to-use mode and code snippets for various cool ML projects, dealing with a wide variety of things like text, images, and sound.
Kartik Nighania is a a machine learning enthusiast who loves computer vision more than NLP. He previously worked in the field of robotics especially drones which haunts me to this date. In love with Kaggle.