Multimodal input
Send an image or a recording. The model describes it.
image
audio
๐
๐
๐
+
pick an image above
โ Record
no recording yet
Describe