Karel Sedláček

Scaling and Hot Spot Prevention in Order-preserving Sharding

Many modern databases support order-preserving sharding. This type of sharding has desirable properties, particularly query isolation. Unfortunately, this type of sharding in real-world scenarios often creates some very serious problems that result in hot spots and limit scaling. This document presents a possible solution to these issues.

Note: In order to maintain some basis in the real world, the example material I will be using is based on the MongoDB documentation for choosing a shard key, which presents an order-preserving shard key function that is based on a “state of residence” attribute (“state” in this case means, e.g., New York, not “residing”/”not residing”, etc.) I consider in my examples a hypothetical database wherein the documents stored describe persons, each of whom has a state of residence.

When using order-preserving partitioning, there are frequently attributes for which certain types of queries—chiefly range queries—benefit enormously from having an order-preserving shard key function. That said, these attributes often exhibit two problems: they themselves may not provide sufficient granularity to allow effective rebalancing and/or the workloads associated with each key are naturally unbalanced. State of residence in our hypothetical database is an excellent example of both of these properties; given that the set of states is (semi-)fixed, at a fairly small number, one can not expand beyond 50 shards by using a naive order-preserving function that is merely the value of the state, and, given (as might be expected) the prior probability of a person existing in the database is sufficiently independent of their state of residence, the natural population discrepancies between states leads hot spots to form naturally in highly populous states such as New York and California.

Both of the issues described can be overcome in many applications with reasonable ease. One might choose, e.g., another attribute to append to the state of residence in the shard key that has more bits of variation, such as the user’s full name, ZIP code, etc. It is not necessary to know this attribute while querying in order to gain the query isolation benefits for range queries, we just set the shard key boundaries such that they encompass all objects that exist in the entire range of the given prefixes (perhaps by using a high-value non-printable character that is not allowed in the additional attribute being used; a delimiter that is not used in the state name may also be required, were there to be state names with prefixes that overlapped one another). The additional attributes are simply used to establish a more granular order between records and allow for rebalancing boundaries to occur within a single attribute value.

Though the above solution can likely be achieved in MongoDB by the client, with no change to the database code itself, if we are willing to write new code we can achieve the same properties with no finangling from the user. All that is required is a slight change in our philosophy about the treatment of shard keys. Let us merely release the requirement that every node needs to have a range which starts and ends on a unique shard key.

We have now opened up a whole new world of possibilities. The system is free to allow multiple shards to have their range begin and end on the same shard key, internally allowing nodes to renegotiate their burden for that shard key based on, for example, an artificial ordering based on unique object id. Boundaries are now independent of the lacking granularity of the attribute to be queried, while maintaining the system’s ability to perform query isolation over ranges. We are similarly able to scale our system beyond what it was before, where certain attribute values might have overwhelmed the capacity of a single node, and where there was no way to divide the load.

So, what are the downsides? The standout concern is that we are now less able to do without a means in the system for clients/client routers to dynamically update their knowledge of partition boundaries. This is the natural cost of wishing to support dynamic rebalancing, and without dynamic rebalancing there is no point in implementing these changes. WIthout support for such rebalancing, the system is highly susceptible to hot spot formation anyway, due to the characteristics of order-preserving partitioning over attributes whose value probability distribution is inequal. Thus, dynamic rebalancing being part and parcel of implementing the type of system we wish to create, we can discard it as a downside for practical purposes.