Hypothetical C++: extensible tagging
Data in a C++ program flows from a source, gets modified multiple times and ends up as something entirely different. During processing, the data may change type, reflecting its changing nature. As programmers, we design a type system that reflects the semantic meaning of the data, its structure and its current state. Ideally, the C++ type system should give us all the vocabulary we need.

<div><p>C++ has access to a rich set of tools to design types and create this vocabulary. Its type system is open-ended and allows programmers to define new types as needed. Yet there is one aspect in which it is closed: type qualifiers.</p><p>C++ provides two type qualifiers: <code>const</code> and <code>volatile</code><sup><a href="#footnotes">(1)</a></sup>. They allow us to give additional meaning to an existing type without having to create a separate type manually. The <code>const</code> qualifier is the most used, of course. If it didn’t exist, having to reproduce its behaviour for every type we create would be tedious. Without it, C++ would be a lot less expressive and harder to reason with. And yet, this expressive power is limited to these two qualifiers. There is a disconnect between their usefulness and our inability to create new ones.</p><h2>User-Defined Qualifiers</h2><p>What I’d like to see in C++ is the opening of qualifiers to user-defined ones. I’ll give you a few examples of what can be achieved with user-defined qualifiers, but let’s first describe how they would work. Quite simply, they would work just like <code>const</code> and <code>volatile</code>. That is:</p><ul> <li>Qualify a type or a member function.</li> <li>An optional automatic one-way equivalence could be declared, like how you can pass a non-<code>const</code> pointer to a function taking a <code>const</code> pointer.</li> <li>Forced conversion between the qualified and non-qualified type using a <code>const_cast</code><sup><a href="#footnotes">(2)</a></sup>.</li></ul><p>I do not wish to focus on the details of a hypothetical syntax. I think this detail is unimportant and everyone could come up with something. I’ll just show one possible choice using a new <code>typequal</code> keyword:</p><pre><code>typequal NewQualifier;typequal NewQualifier auto qualified;typequal NewQualifier auto non-qualified;typequal NewQualifier invalid;typequal NewQualifier invalid auto qualified;</code></pre><p>Each of these examples would create a qualifier named <code>NewQualifier</code>. The “<code>auto qualified</code>” variant means that the qualifier can be added silently to a type when assigned to a variable in the same way that the <code>const</code> qualifier works. The “<code>auto non-qualified</code>” variant allows the automatic conversion in the opposite direction. What about the “<code>invalid</code>” variant? This declares that data with the qualifier cannot be accessed. As you will see, this is a useful feature.</p><h2>Payoffs</h2><p>Now, I want to convince you of the usefulness of this feature. Let’s try to solve multiple problems that cause real bugs in actual programs.</p><h3>1. Null Pointers</h3><p>Let’s start with <code>null</code> pointers. Dereferencing <code>null</code> pointers is a major source of bugs. Having nullable pointers is often decried as a major design blunder in the language. But the real problem is not the <code>null</code> pointer. It’s the fact that the language does not prevent us from using a <code>null</code> pointer. Let’s fix that:</p><pre><code>typequal maybe invalid;</code></pre><p>That’s it. Now, every function that produces a pointer should produce a pointer with the underlying type adorned with the <code>maybe</code> qualifier. Given that it is declared as an invalid-marking qualifier, the language will not allow us to use the data. Once you have tested for <code>null</code>, you can <code>const_cast</code> it to remove the <code>maybe</code> qualifier. Of course, it would be even more practical if the language supported such a qualifier natively, so that built-in functions would take and provide pre-qualified pointers.</p><h3>2. Invalid Data</h3><p>Similar qualifiers can be used to describe different states of data. Two examples that often come up in code would be:</p><pre><code>typequal invalidated invalid auto qualified;typequal tainted invalid auto qualified;</code></pre><p>The first could be used to mark data when it has not yet been validated against a desired constraint. Often, a given group of functions will impose such constraints on its input. By having the entry-point function take <code>invalidated</code> data and internal functions taking unqualified data, we can insure that internal functions cannot be called without the data being validated.</p><p>The second one is an idea borrowed from Perl: that <code>tainted</code> data cannot be trusted. It is similar to <code>invalidated</code>, but instead of merely not conforming to some constraint, it is to be entirely treated with suspicion. In Perl, such <code>tainted</code> data comes from web data, email data and other such untrusted sources. Additional precautions should be taken when validating the data.</p><h3>3. State of Data</h3><p>Of course, such a system of validation can be extended to support multiple states to reflect the progression of an algorithm. Or it can reflect different types of validation. Here are a few ideas:</p><pre><code>// The data has been sorted and can be binary-searched.typequal sorted;// Sort a vector and return the same vector with the qualifier.sorted vector<int>& sort(vector<int>& unsorted_vector);// Do a binary-search in a vector, but only if it has already been sorted.bool find_in_data(const sorted vector<int>& sorted, int value);// The data is shared between threads.// You would create an instance of Lock that would// take the shared data and mutex as arguments and// do the const_cast to remove the shared qualifier.typequal shared invalid;// Which coordinate system is used in a 3D algorithm.// Avoids error of using a local point in an algorithm// working in world coordinates.typequal local;typequal world;typequal view;typequal screen;// Applying the corresponding matrix would return the// vector with the qualifier correctly updated.world vector& apply_world_matrix( local vector&, const local world matrix&);</code></pre><h2>Conclusion</h2><p>My goal was to show you the advantages of adding user-defined qualifiers to the C++ language. As demonstrated, user-defined qualifiers open a new world of possibilities with various benefits:</p><ul> <li>Being more expressive with existing data types.</li> <li>Clearly representing the state of the data to the code reader.</li> <li>Following the progression of changes made to data.</li> <li>Allowing the compiler to enforce various constraints.</li> <li>Avoiding bugs that arise from incorrect data passing to functions.</li></ul><div><p><br> </p><hr><p><sup>(1)</sup> There are also the <code>restricted</code>, <code>register</code> and <code>auto</code> type modifiers, but they do not exactly behave like <code>const</code> and <code>volatile</code>.</p><p><sup>(2)</sup> Although it would be more elegant to rename it to <code>qualifier_cast</code>.</p></div>
Want to Work Together?
Every great project starts with a conversation.
