CP is spot on. Except, unless there is a vast contrast issue, between the 2 sources, you only need to Bezier one of the Events. But sure, Bezier Mask both and any "gap" between each source you could create a Track3 with some form of generated media "white back background - yes?
Bezier masking is great 'cos IF the 2 items move within their respective Masks, you could conceivably readjust each mask independently against time using the keyframer.
There is another way where you introduce a composite Pan/Cropped Generated white inter layer. Done that, been there. Moved on to Bezier Masking - far easier to control. However, 'cos Bez Masking IS very content-centric (
it is only of real value to the actual sample being used!) the composite process can be quickly used as a VEG template and all I need to do is swap out the media. That's another thought!
Grazie