[ad_1]
Our V7 foundation model, which finished pre-training last week, is natively multimodal.
It processes a video/audio bitstream directly, understanding it without converting it into anything, so, for example, it will finally understand nuances in how you speak that convey mood and emphasis.
[ad_2]
Source by Elon Musk


