The text typecast of a value, the literal for that value, and how they are related
This section establishes some basic notions that have a much broader scope of applicability than just arrays. But, because using array literals rests on these notions, they are summarized here.
The non-lossy round trip: value to text typecast and back to value
Consider this pattern:
do $body$
declare
original constant <some data type> not null := <some value>;
text_cast constant text not null := original::text;
recreated constant <some data type> not null := text_cast::<some data type>;
begin
assert
(recreated = original),
'assert failed';
end;
$body$;
It demonstrates a universal rule that YSQL inherits from PostgreSQL:
- Any value of any data type, primitive or composite, can be
::text
typecast. Similarly, there always exists atext
value that, when properly spelled, can be typecast to a value of any desired data type, primitive or composite. - If you
::text
typecast a value of any data type and then typecast thattext
value to the original value's data type, then the value that you get is identical to the original value.
The following DO
block applies the pattern using a representative range of both primitive and composite data types. (The data type text
, as the degenerate case, is not included.) It also displays the value of the ::text
typecast for each data type.
Notice that the last test uses an array whose data type is the user-created DOMAIN
"int_arr_t". Using an array of DOMAIN
values explains this notion. This is a real stress-tests of the rule.
-- Needed by the '1-d array of "row" type values' test.
create type rt as (n numeric, s text, t timestamp, b boolean);
-- Needed by the 'Ragged array' test.
create domain int_arr_t as int[];
do $body$
begin
-- numeric
declare
original constant numeric not null := 42.1763;
text_cast constant text not null := original::text;
recreated constant numeric not null := text_cast::numeric;
begin
assert
(recreated = original),
'assert failed';
raise info 'numeric: %', text_cast;
end;
-- timestamp
declare
original constant timestamp not null := now()::timestamp;
text_cast constant text not null := original::text;
recreated constant timestamp not null := text_cast::timestamp;
begin
assert
(recreated = original),
'assert failed';
raise info 'timestamp: %', text_cast;
end;
-- timestamp with timezone
declare
original constant timestamptz not null := now()::timestamptz;
text_cast constant text not null := original::text;
recreated constant timestamptz not null := text_cast::timestamp;
begin
assert
(recreated = original),
'assert failed';
raise info 'timestamptz: %', text_cast;
end;
-- boolean
declare
original constant boolean not null := true;
text_cast constant text not null := original::text;
recreated constant boolean not null := text_cast::boolean;
begin
assert
(recreated = original),
'assert failed';
raise info 'boolean: %', text_cast;
end;
-- "row" type
declare
original constant rt not null := row(42.1763, 'dog house', now(), true);
text_cast constant text not null := original::text;
recreated constant rt not null := text_cast::rt;
begin
assert
(recreated = original),
'assert failed';
raise info '"row" type: %', text_cast;
end;
-- 2-d array
declare
original constant int[] not null := array[array[1, 2], array[3, 4]];
text_cast constant text not null := original::text;
recreated constant int[] not null := text_cast::int[];
begin
assert
(recreated = original),
'assert failed';
raise info '2-d array %', text_cast;
end;
-- 1-d array of "row" type values
declare
original constant rt[] not null :=
array[
row(42.1763, 'dog house', now(), true),
row(19.8651, 'cat flap', now() + interval '1' day, false)
];
text_cast constant text not null := original::text;
recreated constant rt[] not null := text_cast::rt[];
begin
assert
(recreated = original),
'assert failed';
raise info 'array of "row" type: %', text_cast;
end;
-- Ragged array: 1-d array of 1-da arrays of different lengths.
declare
arr_1 constant int_arr_t not null := array[1, 2];
arr_2 constant int_arr_t not null := array[3, 4, 5];
original constant int_arr_t[] not null := array[arr_1, arr_2];
text_cast constant text not null := original::text;
recreated constant int_arr_t[] not null := text_cast::int_arr_t[];
begin
assert
(recreated = original),
'assert failed';
raise info 'array of arrays: %', text_cast;
end;
end;
$body$;
It produces this result (after manually removing the "INFO:" prompt on each output line.
numeric: 42.1763
timestamp: 2020-05-03 22:25:42.932771
timestamptz: 2020-05-03 22:25:42.932771-07
boolean: true
"row" type: (42.1763,"dog house","2020-05-03 22:25:42.932771",t)
2-d array {{1,2},{3,4}}
array of "row" type: {"(42.1763,\"dog house\",\"2020-05-03 22:25:42.932771\",t)","(19.8651,\"cat flap\",\"2020-05-04 22:25:42.932771\",f)"}
array of arrays: {"{1,2}","{3,4,5}"}
Multidimensional array of int
values explains the syntax of the 2-d array text
value.
The literal for a "row" type value explains the syntax of the "row" type text
value.
And The literal for an array of "row" type values explains the syntax of the value: array of "row" type text
value.
Notice how the syntax for the array of arrays text
value compares with the syntax for the 2-d array text
value. Because the array of arrays is ragged, the two inner {}
pairs contain respectively two and three values. To distinguish between this case and the ordinary rectilinear case, the inner {}
pairs are surrounded by double quotes.
boolean values show special text forms in ysqlsh
Try this:
select true as "bare true", true::text as "true::text";
This is the result:
bare true | true::text
-----------+------------
t | true
For all but boolean
values, the string of characters that ysqlsh
uses to display any value is the ::text
typecast of that value. (After all, the only feasible means of display is strings of characters.) But uniquely for the two boolean
values denoted by the keywords TRUE
and FALSE
it uses the single characters t
and f
rather than their ::text
typecasts—unless you explicitly write the typecast.
This behavior is inherited from psql
.
You saw above that even when you explicitly ::text
typecast a composite value, TRUE
and FALSE
are represented as t
and f
. You can't influence this outcome because it has to do with the rules for deriving the text
of the typecast and not with the convention that ysqlsh
uses. This asymmetry was established many years ago, and it will not change.
The relationship between the text typecast of a value and the literal that creates that value
Try this in ysqlsh
:
select
42.932771::numeric as n,
'cat'::text as t1,
$$dog's breakfast$$::text as t2,
array[1, 2, 3]::int[] as "int array";
It shows the result:
n | t1 | t2 | int array
-----------+-----+-----------------+-----------
42.932771 | cat | dog's breakfast | {1,2,3}}
You won't be surprised by this. But you need to establish the proper terms of art that allow you to describe what's going on precisely and correctly. The remaining sections in Creating an array value using a literal rely on this.
Consider this first:
42.932771::numeric
This is the literal that the SQL language (at least in the YSQL and PostgreSQL dialects) uses to establish the corresponding strongly-typed numeric
value. (PL/pgSQL uses the same form for the same purpose.) But, to state the obvious, a SQL statement and a PL/pgSQL source are nothing but strings of characters. That means that, in the present context, this:
42.932771
is the text of the literal.
Now consider these two:
'cat'::text $$dog's breakfast$$::text
The parsing rules of both SQL and PL/pgSQL (or more properly stated, the definitions of the grammars of these two languages) require that text
literals are enquoted. Moreover, there are two syntactic mechanisms for doing this: the ordinary single quote; and so-called dollar quotes, where $$
is a special case of the more general $anything_you_want$
. You might think that the ::text
typecast is redundant here. But don't forget that the text of these literals might be used to establish varchar
or char
values.
You see already, then, that the rules for composing a numeric
literal and a text
literal are different:
-
You compose a
numeric
literal by following the bare text that specifies the intended value with the::numeric
typecast operator. -
You compose a
text
literal by enquoting the bare text that specifies the intended value (however you choose to do the quoting) and by then following this with the::text
typecast operator.
(If you did enquote the bare text in a numeric
literal, then you would not see an error. Rather, you would get implicit but undesirable behavior: first, a genuine text
value would be generated internally, and then, this, in turn, would be typecast to the numeric
value.)
You've already seen, informally, some examples of array literals. Here is the rule:
- You compose the bare text that specifies the intended value by writing an utterance in a dedicated grammar that starts with the left curly brace and ends with the right curly brace. (This grammar is the focus of the remainder of Creating an array value using a literal.) Then you enquote this bare text (however you choose to do the quoting) and then typecast it to the desired target array data type.
These are three special cases of a more general rule. In some cases (for example in the literal for a "row" type value) the enquoting mechanism might be optional (depending on the intended value) and, when written uses double quote as the enquoting character. But here, too, the general rule is the same. The bare text that specifies the intended value can always be correctly written as the ::text
typecast of that value.
Stating the general rule
Here is the general rule.
- The literal for a value of any data type is the possibly enquoted bare text that specifies the intended value, followed by the typecast operator to the desired target data type.
- This rule is applied recursively, for the literal for a composite value, but with different actual rules at different levels of nesting. For example, the literal for an array value as a whole must be typecast. But, because the data type of every value in the array is already determined, the bare text that specifies these values is not typecast.
- The
::text
typecast of any value can always be used as the bare text of the literal that will recreate that value.
You can see examples of the text of the literal that creates an array value by creating the value using the constructor and then inspecting its ::text
typecast. But the safe way to create the text of a literal for an intended value is to understand the syntax and semantics that govern its composition.
When this difference is important, the "Array data types and functionality" major section distinguishes between:
- (1) the literal for the intended value (that is, the whole thing with the enquoting and the typecast, when one or both of these are needed);
- and (2) the text of the literal for the intended value (that is, the bare text that specifies this value).
Note: Often, the data type of the assignment target (for example, a field in a column in a schema-level table or a variable in a PL/pgSQL program) is sufficient implicitly to specify the intended typecast without writing the operator. But it's never harmful to write it. Moreover, in some cases, omitting the explicit typecast can bring performance costs. For this reason, Yugabyte recommends that you always write the explicit typecast unless you are certain beyond all doubt that omitting it brings no penalty.
Defining the "canonical form of a literal"
The term "canonical form" applies specifically to the text of a literal rather than to the literal as a whole. But when the text of a literal is in canonical form, the literal as a whole, too, is in canonical form.
The canonical form of the text of a literal that produces a specific value, of any data type, is the ::text
typecast of that value.
Many of the examples in this "Array data types and functionality" major section show that many spellings of the text of an array literal, in addition to the canonical form, will produce a particular intended target value. The differences are due to how whitespace, punctuation, and escape characters are used.