Multimodal input

Send an image or a recording. The model describes it.

pick an image above