YugabyteDB coding style
YugabyteDB is primarily written in C++ (the distributed storage and transactions layer and the YCQL query layer) and C (the YSQL layer based on PostgreSQL), with some parts of the build system and test suite written in Python, Java, and Bash. We also use Protocol Buffers to define some data and network message formats.
Language-agnostic style guidelines
Variable names
Avoid rarely used abbreviations. Think about whether all other potential readers of the code know about any abbreviations you're using.
Comments
Start full sentences with a capital letter, and end them with a period. This rule doesn't apply if the comment is a single phrase on the same line as a code statement.
Functions and classes in header files should typically have detailed comments, but the comments should not duplicate what is already obvious from the code. In fact, if the code can be restructured, or if functions or classes could be renamed to reduce the need for comments, that is the preferred way. Obvious comments like the following don't add anything useful:
// Returns transaction ID.
const TransactionId& id() const;
Functions and classes in .cc
files don't have to be commented as extensively as code in header files. However, do add comments and examples for anything that might be non-obvious to a potential new reader of your code.
C coding style
For the modified PostgreSQL C codebase residing inside the YugabyteDB codebase, we adhere to the PostgreSQL Coding Conventions. Note that PostgreSQL code uses tabs for indentation and we follow that rule in the src/postgres
subdirectory; we use spaces for indentation everywhere else in YugabyteDB code.
C++ coding style
Line length
Use a 100-character line length limit.
Formatting function declarations and definitions
Use one of the following formatting styles for function declarations and definitions.
All arguments on one line
ReturnType ClassName::FunctionName(ParameterType1 par_name1, ParameterType2 par_name2) {
DoSomething();
...
Aligned, one argument per line
All arguments aligned with the opening parenthesis, one argument per line.
ReturnType ShortClassName::ShortFunctionName(ParameterType1 par_name1,
ParameterType2 par_name2,
ParameterType3 par_name3) {
DoSomething(); // 2-space indentation
...
}
Four-space indentation
One argument per line, with four-space indentation for each argument.
ReturnType SomeClassName::ReallyLongFunctionName(
ParameterType1 par_name1, // 4-space indentation
ParameterType2 par_name2,
ParameterType3 par_name3) {
DoSomething(); // 2-space indentation
...
}
Packed
Pack arguments into the fewest number of lines—but not exceeding the maximum line width—with each line indented by four spaces.
Don't break the argument list arbitrarily, and only break the list if the next argument won't fit within the line-length limit.
// Suppose this is the right margin -----------------------------------------------------------> |
// |
// |
ReturnType SomeClassName::ReallyLongFunctionName(
ParameterType1 par_name1, ParameterType2 par_name2, ParameterType3 par_name3,
ParameterType4 par_name4, ParameterType1 par_name5) { // 4-space indentation for these 2 lines
DoSomething(); // 2-space indentation
...
}
Formatting function calls and macro invocations
Use one of the following formatting styles for functions calls and macro invocations.
All arguments on one line
bool result = DoSomething(argument1, argument2, argument3);
Aligned arguments
All arguments aligned with the opening parenthesis, one argument per line.
bool result = DoSomething(very_very_very_very_long_argument,
argument2,
argument3);
One argument per line, four-space indentation
bool result = DoSomething(
argument1, // 4-space indentation
argument2,
argument3);
Packed arguments, four-space indentation, wrapping at margin
Start a new line after the opening parenthesis, with a four-space indentation, and pack arguments into as few lines as possible.
// Suppose this is the right margin -----------------------------------------------------------> |
// |
// |
bool result = DoSomething(
argument1, argument2, argument3, argument4, argument5, argument6, argument7, argument8,
argument9, argument10, argument11, argument12, argument13, argument14, argument15,
argument16, argument17);
Function calls within function calls
The preceding formatting styles also apply to nested function calls.
bool result = DoSomething(
argument1,
argument2,
ReallyLongFunctionName(
ReallyLongArg1,
arg2,
arg3),
argument3,
argument4);
String substitution functions
For string substitution and formatting functions (Format
, Substitute
, StringPrintf
, and so on) avoid putting substitution parameters on the same line as the format string, unless the entire function call fits on one line. For example:
// Suppose this is the right margin -----------------------------------------------------------> |
// |
// |
// Good:
return Format(
"My formatting string with arguments $0, $1, $2, $3, and $4",
compute_arg0(), compute_arg1(), compute_arg2(), compute_arg3(), compute_arg4());
// Bad: notice it's harder to see where the first substitution argument is.
return Format(
"My formatting string with arguments $0, $1, $2, $3, and $4", compute_arg0(),
compute_arg1(), compute_arg2(), compute_arg3(), compute_arg4());
Expressions
Indent multi-line expressions like this:
const bool is_fixed_point_get = !lower_doc_key.empty() &&
upper_doc_key.HashedComponentsEqual(lower_doc_key);
Or like this:
const bool is_fixed_point_get =
!lower_doc_key.empty() &&
upper_doc_key.HashedComponentsEqual(lower_doc_key);
The following style is also widely used in our codebase, so it's acceptable to leave it as-is when modifying the surrounding code, but the two previous options are preferable for new code.
const bool is_fixed_point_get = !lower_doc_key.empty() &&
upper_doc_key.HashedComponentsEqual(lower_doc_key);
const auto mode = is_fixed_point_get ? BloomFilterMode::USE_BLOOM_FILTER :
BloomFilterMode::DONT_USE_BLOOM_FILTER;
Ternary operator
For expressions involving the ternary operator (?
and :
), prefer one of the following formatting styles:
const auto mode = is_fixed_point_get ? BloomFilterMode::USE_BLOOM_FILTER
: BloomFilterMode::DONT_USE_BLOOM_FILTER;
or
const auto mode =
is_fixed_point_get ? BloomFilterMode::USE_BLOOM_FILTER
: BloomFilterMode::DONT_USE_BLOOM_FILTER;
Command-line flag definitions
Put the flag name on the same line as DEFINE_...
to make the code more grep-friendly.
Follow function-like macro invocation styles with either all arguments on one line or with aligned arguments as defined earlier, depending on whether all three arguments of the flag definition fit on one line.
DEFINE_bool(create_table,
true,
"Whether the table should be created. It's made false when either "
"reads_only/writes_only is true. If value is true, existing table will be deleted and "
"recreated.");
Note that in this code style, we've aligned the first line of the string constant (the flag description) with its other lines. Some coding styles, including CLion's standard behavior, would indent the second and later lines by four spaces, but we prefer to keep them aligned.
Forward declarations
You can use forward declarations in a header file, if the class you are referencing is to be defined in the related .cc
file, essentially making it a private class implementation. This helps keep the header file cleaner, and keeps the definition and implementation of the private class together with the actual implementation of the main class.
We also frequently use special header files named ..._fwd.h
that forward-declare various classes and declare some types, including enums. This helps to avoid including full class declarations wherever possible, and reduces compilation time.
Parameter ordering
Normally, in function parameter lists in YugabyteDB, we put input parameters first, followed by output parameters.
However, this rule doesn't apply to parameters that aren't pointers to variables to be assigned to by the functions, but are simply non-const objects that the function calls some methods on that modify its state.
For example, in the following function it is OK that writer
is in the middle of the parameter list, because the function is not calling the assignment operator on *writer
but simply calls some member functions of *writer
that modify its internal state, even though this is how this particular function produces its "output".
void Tablet::EmitRocksDBMetrics(std::shared_ptr<rocksdb::Statistics> rocksdb_statistics,
JsonWriter* writer,
const MetricJsonOptions& opts) {
Pointer and reference parameters and variables
Put a space on either side of *
and &
(but not on both sides).
Both of the following examples are correct:
Status GetSockAddrorTS(const std::string& ts_uuid, Sockaddr* ts_addr);
Status GetSockAddrForTS(const std::string &ts_uuid, Sockaddr *ts_addr);
Similarly, for variable declarations and definitions:
int* a = nullptr;
int *a = nullptr;
Get prefix for getters
Some C++ coding styles (such as Google's) use the Get
prefix for functions returning a value, and some don't (Boost, STL). In YugabyteDB code it is allowed to either use or not use the Get
prefix.
There are many different function names in our codebase with the Get
prefix.
The Get
prefix is especially useful if the function name could be misinterpreted as a verb or a verb phrase without it, but is optional in all cases.
If you can't decide whether to use the Get
prefix for functions, look at other functions in the same class, file, or subsystem, and adhere to the prevalent naming style in the surrounding code.
Code duplication
Try to reduce code duplication by extracting repeated code into reusable functions, classes, or templates. Reducing code duplication with macros is also allowed, but try not to overuse them; use non-macro-based ways to reuse code duplication as much as possible. Undefine macros that are only used in an isolated section of code when they're no longer needed.
Switch statements over enums
If you don't use the default statement in a switch over an enum, the compiler will warn you if some values aren't handled (and we have made that warning an error). This allows the compiler to enforce that all enum values are being handled by a switch over an enum, if that's your intention. This complicates default case handling a bit, though. If every case is followed by a return, you can simply move the default handler to right after the end of the switch statement, such as:
switch (operation_type) {
case OperationType::kInsert:
...
return;
case OperationType::kDelete:
...
return;
}
FATAL_INVALID_ENUM_VALUE(OperationType, operation_type);
In this case, when you add a new request type, such as OperationType::kUpdate
, you'll get a compile error asking you to add it to the switch statement. In case there is no return following each case handler, we can still detect the default case by (for example) setting a boolean flag.
boolean handled = false;
switch (operation_type) {
case OperationType::kInsert:
...
handled = true;
break;
case OperationType::kDelete:
...
handled = true;
break;
}
if (!handled) {
FATAL_INVALID_ENUM_VALUE(OperationType, operation_type);
}
Note that FATAL_INVALID_ENUM_VALUE
will terminate the process, so for functions returning a Status
, you should handle invalid enum values gracefully, e.g.:
return STATUS_FORMAT(
IllegalArgument, "Invalid value of operation type: $0",
operation_type);
Arguments passed by value
Don't use const
in function (including member function) declarations when the argument is passed by value.
Do use const
in function (including member function) definitions wherever appropriate to indicate (and enforce) that the object is not modified inside the function body.
In widget.h
:
class Widget {
public:
void Say(Phrase p);
};
In widget.cc
:
void Widget::Say(const Phrase p) {
. . .
}
Or in widget.h
:
void ProcessWidget(WidgetType widget_type,
int widget_cost,
const Widget& widget);
And in widget.cc
:
void ProcessWidget(const WidgetType widget_type,
const int widget_cost.
const Widget& widget) {
// Implementation
}
Here's an example of what not to do (in widget.h
):
void ProcessWidget(const WidgetType widget_type, // BAD: "const" should be removed!
const int widget_cost. // BAD: "const" should be removed!
const Widget& widget); // OK: "const" is part of const reference.
The using keyword and namespaces
You can use the using
directive for inner utility namespaces in .cc
files (but not in header files), e.g.:
using namespace std::placeholders;
using namespace std::literals;
using namespace yb::size_literals;
This allows you to use various convenient literals, such as 100ms
to denote "100 milliseconds".
Unnamed namespaces vs static keyword
When you define functions that should only be accessible within a .cc
file, prefer using unnamed namespaces to using the static
keyword. See for example this discussion.
Static and global variables
Using static objects is allowed in our coding style with a few limitations, and with an understanding of a static object's lifecycle. The limitations are:
- Static global variables must be totally independent. The order of constructions/destructions should not matter.
- Static objects must not lock anything or allocate global resources (except memory) during their construction.
- Static objects should be as simple as possible.
If you are adding new static objects, remember:
-
Global static objects (as well as static class fields) are constructed before the
main()
function call. -
Local static objects are constructed when they first come into variable scope. This option should be preferred. For local static objects, there is also a locking mechanism needed so that a concurrent execution waits for initialization completion if the static variable is already being initialized by another thread. For example, for a simple static variable defined in a function
void f() { static string s = "asdf"; }
The generated code might be using a lock automatically to guard the static variable initialization:
call __cxa_guard_acquire test eax, eax setne al test al, al je .L6 mov ebx, 0 lea rax, [rbp-17] mov rdi, rax call std::allocator<char>::allocator() lea rax, [rbp-17] mov rdx, rax mov esi, OFFSET FLAT:.LC0 mov edi, OFFSET FLAT:f()::s call std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&) mov edi, OFFSET FLAT:guard variable for f()::s call __cxa_guard_release
-
Static objects are destroyed after the end of the
main()
function. -
The order of global static objects construction is undefined, and can even change from build to build.
-
The order of static objects destructions is reverse of the order of constructions, so it is also undefined.
Multiple inheritance
Use protected or private inheritance if you can.
Multiple inheritance is allowed in general, subject to a few limitations:
-
Don't down-cast pointers or references (casting from a base to a derived class).
Note: The only way to do this safely in C++ is using the
dynamic_cast<...>
operator, which relies on run-time type information (RTTI), and we may decide to disable RTTI in release builds at some point for performance reasons. -
Don't use multiple inheritance if a base class is part of this object from an architectural point of view. Instead, prefer composition (making the "part" object a member field in the "whole" object) in such cases.
Testing whether a pointer is null
It's OK to use implicit boolean conversion for raw, unique, or shared pointers:
if (p) {
p->DoSomething();
} else {
... // Handle the nullptr case
}
Another example:
LOG(INFO) << (p ? p->DebugString() : "N/A");
Error checking macros
We use various macros for error/invariant checking.
Checking a condition and returning a Status with SCHECK
SCHECK
(shorthand for "Status CHECK") checks a condition and returns a Status
with the appropriate message if the condition is not true. It can only be used within a function that returns a Status
or a Result
.
The SCHECK
macro is a good way to check for errors that are expected to happen under some conditions, e.g. with invalid input, and the errors need to be ultimately reported to the user.
SCHECK(key_opt.is_initialized(), InternalError, "Key is not initialized");
There are also variants the SCHECK
macro for equality checks and comparisons: SCHECK_EQ
, SCHECK_NE
, SCHECK_GT
, SCHECK_LT
, SCHECK_GE
, and SCHECK_LE
.
SCHECK_EQ(schedules.size(), 1, IllegalState,
Format("Expected exactly one schedule with id $0", schedule_id));
Returing a Status in release mode, but triggering a fatal error in debug mode, with RSTATUS_DCHECK
RSTATUS_DCHECK
works similarly to SCHECK
in release mode, but triggers a fatal error and a log message in debug mode, terminating program execution. Similarly to SCHECK
, it also has variants for checking for equality and inequality. RSTATUS_DCHECK
can be used for invariant checks and sanity checks where the error is not expected to happen under normal circumstances but there is a recovery path from this error in a production situation. In these cases it is OK to cause a unit test to crash in debug mode when the error is encountered.
Checking an invariant that must always hold with CHECK
For really important invariants that are difficult to recover from while still maintaining correctness, we sometimes use the CHECK
macro and its variants. It is enabled in both debug and release modes and causes program termination if the condition is not true. It should be used really carefully to avoid causing unnecessary server restarts in release mode.
Only checking a condition in debug mode with DCHECK
This macro expands to a no-op in release mode. This is reserved for checking invariants or preconditions in performance-critical code, and typically only in cases when we already expect the condition to be true because other code guarantees it.
We sometimes use DCHECKs
to verify function prerequisites. If you never expect an incorrect parameter value to be passed into a function, because there is validation happening in the calling function, it's OK to keep that as a DCHECK
.
However, if you could theoretically get bad data in production at a certain point in the code, then:
- If you can recover from this error, return an error
Status
(e.g. usingSCHECK
orRSTATUS_DCHECK
). - If this is a severe invariant violation and you can't recover from it, this could be a
CHECK
.
PREDICT_TRUE and PREDICT_FALSE
PREDICT_TRUE
and PREDICT_FALSE
macros expand to hints to the compiler that a particular codepath is likely or unlikely. In theory these macros could allow better compiler optimizations. However, we don't use them in new code as it is difficult to check if they really improve performance.
Result vs Status with output parameters
For new code, use Result
, e.g.:
static Result<DocKeyHash> DecodeHash(const Slice& slice);
Much of our code is wired to return Status
, so we are able to get a sense of whether a function completed successfully (OK), or if there was some kind of an error. However, we sometimes want to also get legitimate output from these functions. We used to do this by using function signatures such as
// Old approach, don't use this in new code.
Status foo(int* return_variable);
However, now we have a better way to achieve the same goal; the Result
type can encapsulate both an output parameter (in the successful case) and a Status
(in case of an error):
Result<int> foo();
String formatting
Use the Format
function to produce formatted strings, rather than the older function Substitute
.
While the two functions have similar syntax, with inline substitution parameters $0
, $1
, and so on, Format
has several advantages:
- It uses the
ToString
utility, so it can convert many different types of objects to strings, such as collections, protobufs, or any class with aToString
member function. - You don't need to call
arg.ToString()
. Just passarg
to theFormat
function as-is, and it will callToString
for you. Format
is a bit faster thanSubstitute
on some benchmarks.
consensus::OpId
consensus::OpId
is just an alias for yb::OpIdPB
, a protobuf class. Use this only where you really need to use protobuf, for example inside other protobuf messages. For the rest of the code, use yb::OpId
, a normal C++ class.
Thread safety analysis
We use Clang's Thread Safety Analysis annotations in parts of our code.
Thread safety analysis is a C++ extension implemented as part of Clang Static Analyzer that provides compile-time checks for potential race conditions in code. Annotations are extremely useful for code maintainability, because they make locking semantics explicit, and the compiler warns about accessing memory without holding the necessary mutexes.
A few more things to keep in mind:
std::unique_lock
doesn't work with thread safety annotations out of the box. We wrap it into our customyb::UniqueLock
wrapper.- Similarly, for
std::shared_lock
we have theyb::SharedLock
wrapper. - Occasionally, you may need to annotate some functions where thread safety analysis cannot be properly applied, with
NO_THREAD_SAFETY_ANALYSIS
, so that we can still use thread safety analysis in the surrounding code. The situations in which thread safety analysis might not work include conditional locking and complex locking semantics where a unique lock is being passed around between member functions of different classes.
Our build scripts enable thread safety analysis for Clang version 11 and above; earlier versions of the Clang compiler don't support certain features that we need. Thread safety analysis works very well on macOS with modern Clang compilers, providing instant hints if your environment is set up properly.
Unused C/C++ features
- C++ exceptions. In most of our code, we don't use C++ exceptions. However, in some cases, we still have to use C++ standard library functions that throw exceptions, and we catch those exceptions as early as possible and convert them to
Status
orResult
return values. assert
C library macro. We use our own set of macros for invariant checking.
Other related coding guidelines for C++
In addition to the guidelines outlined on this page, the YugabyteDB C++ coding style is based on Google's C++ style guide.
Protocol buffers coding style
Name protobuf structures and enums with a PB
suffix. This is for the following reasons:
- Better "greppability" (searchability) means fewer false positive results when searching for
TableTypePB
and notTableType
. - We can have structs/classes with the same name, but without PB suffix for usage in our code.
- We can have enum wrappers similar to YB_DEFINE_ENUM with the same name as the protobuf enum, but without the PB suffix.