Voices tend to be a bit long, and not individually replayed often. So it's better to stream them instead of loading theminto a sound buffer. The loudness data is very small, though, so that can be kept buffered indefinitely.