Contributed equally with the first author. In this work we attempt to come up with generalisation of dynamic cells in video segmentation, and instead of manually designing contextual blocks that connect per-frame outputs, we propose a neural architecture search solution, where the choice of operations together with their sequential arrangement are being predicted by a separate neural network.