Because modern spatial processing of almost any composition implies a slightly different perception of the scene than the compositions that were in the 70s, for example. If you noticed that in music that was 30-40 years ago, the effect of the presence of the stage is especially felt than now. The fact is that, according to current sound engineering standards, spatial processing is done so that the listener is not in front of a musical group playing their music, but in the very center, where the vocalist sings right in your face and the guitarist and pianist look into your left or right ear: )
You seem to be listening to a mono recording. This means that each earbud reproduces the same sounds. All modern recordings undergo a procedure called "mixing", which includes, among other things, the distribution of all instruments in an imaginary room in their places, which is achieved by adjusting the volume of a certain track in one and the other headphones (or another sound source on which you will listen to the recording) ... That is, the output turns out to be two channels: left and right (sometimes there are more, of course, but this is already for audio systems with a large number of speakers).
For example, vocals in most cases are left 50 to 50 (that is, in the volume of the vocal track is the same on both channels), the same is done with the lead instrument in the absence of vocals (electric guitar, for example). Different parts of a drum kit are usually placed in different channels, but not completely (so if you remove one earbud, you will most likely not hear the entire drum part). Regarding "but not completely", with the advent of digital technologies, it became possible to "take" one track to one channel completely (that is, the track will not sound in another channel at all), but this is not done often, because it is believed that this gives the recording " digital rudeness.