What is Speaker Tracking?
When multiple people are present at a meeting and someone is speaking, it can be challenging to concentrate on the content if the overall view is displayed. Your attention may become out of focus. The Speaker Tracking function addresses this issue by centering on the speaker and providing a close-up view of them on the screen. This feature allows for both a panoramic view of the meeting and detailed visuals of the speaker, enhancing your overall meeting experience.
How to Implement Speaker Tracking
The speaker tracking function is achieved through the collaboration of a microphone array and a camera. The microphone array comprises several microphones, each positioned at a distinct location within the array.
When someone speaks during a meeting, there is a time delay between the sound signals reaching different microphones. The microphone array can pinpoint the location of the sound source by analyzing this time difference, utilizing the sound source localization technology it employs.
The camera will receive the audio position information mentioned above and track it accordingly, capturing a close-up of the speaker at an appropriate ratio.
Secondly, if no one is speaking during the meeting, the voice tracking mode will automatically switch to the automatic framing mode, as introduced in "What is Auto Framing?"
Technical Advantages
In a conference setting, noise and reverberation can disrupt the localization of sound sources. Yealink's Speaker Tracking function calculates the relative angles of multiple microphones using the GCC-PHAT solution. It then employs statistical algorithms for post-processing to effectively filter out reverberation and noise in the conference room, enhancing the accuracy of sound source localization.
Functional Effect
Product Application
MeetingBoard series, MeetingBar AX series, UVC 86, UVC 40, and SmartVision 40 are equipped with a Speaker Tracking function.
Derivative Functions Based on Voice Tracking: Lip Movement Detection and Intercom Mode
Lip Movement Detection
The microphones in the horizontal microphone array are arranged horizontally, allowing them to capture sounds from all directions and significantly improve sound capture in the horizontal plane.
As for language tracking in other contexts, the lip movement mode is introduced. The camera captures continuous dynamic frame information of the lips, interprets the changes in lip movement, and identifies and focuses on the speaker whose lips are in motion.
Lip movement detection effectively manages scenarios where participants are seated one behind the other or at various angles. The detectable range for lip movement detection includes a lateral face angle of approximately -60° to +60°, and a pitch angle ranging from -15° to +30° (with negative values indicating a downward head position and positive values indicating an upward head position). This capability ensures effective voice tracking from multiple angles.
Technical Advantages
The algorithm employed for lip movement detection will first perform face detection and then utilize a keypoint model to extract the key coordinates of the face and lips.
Compared to the conventional PFPLD model, this key point model can effectively capture non-frontal facial scenes from a greater distance.
NME (Normalized Mean Error) is an evaluation metric used to measure the performance of face landmark detection algorithms. A smaller value indicates that the predicted key point is closer to the true position, reflecting better algorithm performance.
Product Application
UVC 86 and SmartVision 40 (coming soon—stay tuned!) support lip movement detection. When UVC 86 receives multiple similar audio position signals from the microphone array and is unable to distinguish between them, it will activate lip movement detection. Meanwhile, SmartVision 40 conducts real-time lip movement detection during voice tracking.
Intercom Mode
When two people are conversing alternately, the voice tracking mode may cause the screen to switch frequently, negatively impacting the viewing experience. In such cases, you can activate the intercom mode to accommodate various conversation scenarios. During alternating dialogue, both participants' screens will be framed; however, when the conversation concludes and only one person continues to speak steadily, the focus will shift to the screen of the active speaker.
Product Application
UVC 86 supports opening in Yealink Room Connect's Start intercom mode.
How to Use Speaker Tracking to Its Fullest Potential?
1.Since the algorithm relies on facial recognition and sound pickup from a microphone array, we, as participants, can optimize its performance by minimizing interference factors in the meeting room.
> Before using the function, you must calibrate the lens to ensure that no objects are obstructing the fixed-focus lens. Recalibration is necessary after moving the camera.
> Avoid reflections of people, mannequins, cartoon characters, and similar objects on glass surfaces or whiteboards in meeting rooms.
> Avoid placing participants in high-exposure areas that could result in missed detections.
> To achieve optimal sound source localization with the microphone array, please maintain a significant distance from both indoor and outdoor noise sources, such as traffic and air conditioning noise.
2.For UVC 86, when a third-party audio device is connected, if an error box appears, please upgrade the version to 151.432.0.18 to determine if this resolves the issue.