I think if you ask four people, you’d get four different opinions. There’s a bit of art to it.
Personally I think the payoff is not worth the effort of separating relays by OS, but there may be some value in limiting the bandwidth between data centers. The method I usually prefer is to designate one relay as the “bridgehead” relay for the site; only this relay would go upstream out of the data center to download from the higher-level relays. This way any given download file only needs to cross the WAN link once, then cached on the bridgehead and available to any other in-site relays that need it.
The remaining relays inside the data center would use this bridgehead as their upstream relay, with one or two of them using additional higher-level relays as their secondaries, in case of an outage on the bridgehead.
Clients, I usually evenly distributed among the relays withing the data center. Automatic Relay Select is usually sufficient for that without any customization.