I’ve been experimenting with vision-language models (VLMs) lately, and smaller ones like SmolVLM have shown me something genuinely fascinating: they behave wildly differently depending on how much context you throw at them. Here’s what I noticed. I gave one model a very specific context: a list of tasks formatted as JSON with exact requirements. It…
Read More