At least on some CPUs (I found out about this from the Arm Cortex-A76 Software Optimization Guide), using X30 with BLR is one cycle slower than using another register.