The guy is not saying caro - he is saying carro, and he is clearly rolling the r. caro means expensive in both Spain and Latin America, while carro means car in Latin America and cart in Spain (the common word for car in Spain being coche). Both sounds, r and rr, have the same place of articulation, that is the alveolar ridge - thus they are both alveolar consonants. The difference is the manner of articulation, where r is a tap (or synonymously, flap) and rr is a trill. So, when pronouncing the alveolar tap, the tongue is only briefly in contact with the alveolar ridge, while it is vibrating against it when pronouncing the alveolar trill.
I thought I would transcribe the dialogue, as the translation given in the video isn't quite literal.
"Trece horas en el carro sin parar ... y no traes música."
Thirteen hours in the car without stopping (lit. to stop) ... and you don't bring music.
"Mira, entre y compra unas papitas."
Look, enter and buy some chips.
EDIT: Somehow I ended up in this thread and didn't notice the date. Sorry for bumping it.