printf as a means of printing variables in C
I do not know how to formulate exactly what I want to ask, but it looks like this:
-
How to print the contents of a variable in With:
char msg = 'k'; printf("%c", msg);
-
How to print the contents of a variable in C++:
char msg = 'k'; cout << msg;
And now the questions:
In C++, you do not need to specify a modifier for output, then why is there no analog in C (I understand that the language is much more complex)? older than C++, but still)?
-
I often write that the type in
printf
should be given:char msg = 'k'; printf("%f", (float)msg);
And why? In the analog C++, you don't have to do the same!
Why can't I write like this: printf("%f", msg);
? What is the reason? Wouldn't printf
lead the data itself to the modifier we specify?
3 answers
In C, printf(format, msg)
in the source code calls the same function, regardless of the type, value of the msg
variable. C - is a statically typed language, which means that at runtime the printf function does not know that msg
is char
, moreover, printf does not even know how many of its arguments were passed. Therefore, you have to manually set the desired representation in the format
line: the value what type of from the memory in which msg
lies, load and how to format this value so that the resulting bytes are then written to stdout .
In C++, cout << msg
for different types msg
can compile a call to different functions (operator overloads <<
). You can easily define your own <<
operator, for example, to output vectors for debugging:
#include <iostream>
#include <vector>
template<class T>
std::ostream& operator << (std::ostream& os, const std::vector<T>& v) {
for (auto&& x : v)
os << x << ' ';
return os;
}
int main()
{
std::vector<int> v {1, 2, 3};
std::cout << v << '\n';
}
Example:
$ g++ -std=c++11 main.cc -o main
$ ./main
1 2 3
Simplifying, we can imagine that the compiler for cout << msg
generates a call to print_char(msg)
if msg
is char
, print_float(msg)
, if msg
is float
, print_vector(msg)
, if msg
is vector
, etc. Here, each function knows what type it accepts, and for each type, a single byte representation is used by default, unless iomanip is used). For example, for int
, its decimal digits are output (not hex or something else) by default.
In C11 appeared _Generic
, which allows different functions depending on the type the argument (controlling-expression
) is called, so you can define print_arg(arg)
that would work for different types of arg
(one argument).
Why can't I write like this:
printf("%f", msg);
? What is the reason? Wouldn't printf lead the data itself to the modifier we specify?
...printf("%f", (float)msg);
\\107.000000
, aprintf("%f", msg);
\\0.000000
why is the answer different?
When calling variable (variadic) functions such as printf, declared with an ellipsis (...
) in parameters that can take a different number of arguments with expected types unknown to the compiler (Since the compiler is not required to understand the printf format string language and the value of format
may be unknown at compile time), default conversions occur: char
implicitly turns into int
(integer promotion) or into unsigned int
if the value of char
is not representable in int
on this platform (exotic). Similarly, float is converted to double when calling printf()
- therefore, %lf
is not required for double
in printf()
, but simply %f
is used.
For the case with printf, you can use a simple model: the compiler puts the arguments (int, double, etc) in the memory area, and printf reads them from there according to the instructions in the format
line-printf acts as a mini-computer: format sets the program and printf reads memory (va_arg
), in which its arguments lie, formats them and writes the resulting bytes to stdout
.
(double)msg
the object may differ in memory from the (int)msg
object. Therefore, printf can see different bit patterns in memory and, accordingly, the result of printf("%f", (int)msg)
and printf("%f", (double)msg)
can be different (the same instruction: %f
is applied to different contents in memory).
An example for my machine (how the types look in memory may depend on the platform (operating system + processor) and compiler options). Bytes in memory I I will show it in the form of hexdump (for example: 6B16 == 10710).
For printf("%c", msg)
:
-
'k'
in C has the typeint
-6B 00 00 00
-
char msg
-6B
- in printf, this is passed as
int
-6B 00 00 00
-
%c
takes this argument6B 00 00 00
and turns it intounsigned char
(6B
) and the corresponding byte (6B
) is output to stdout. See the printf documentation for thec
format.
For printf("%f", (float)msg)
:
-
msg
(6B
) converted to(float)msg
-00 00 D6 42
- what is passed to printf as
double
-00 00 00 00 00 C0 5A 40
-
%f
interprets the memory00 00 00 00 00 C0 5A 40
as floating-point number and outputs in a fixed format (6
decimal places):107.000000
(the character for the dot may depend on the locale). The corresponding ascii-encoded bytes that are written in stdout are:31 30 37 2e 30 30 30 30 30 30
For printf("%f", msg)
:
-
msg
(char -6B
) is passed to printf asint
-6B 00 00 00
-
%f
interprets the memory6B 00 00 00 XX XX XX XX
as a floating-point number5.3e-322
(ifXX == 00
) and outputs0.000000
Since the behavior is undefined (undefined behavior - UB) when the format is incorrect, printf("%f", msg)
can do anything, even launch rockets. On my machine, the result printf("%f", msg)
depends on the previous code to for example:
char c = 'k';
printf("%f\n", (float)c);
printf("%f\n", c);
Prints:
107.000000
107.000000 # XX XX XX XX == 00 C0 5A 40 (остатки от предыдущего вызова)
But:
printf("%f\n", 5.3e-322);
printf("%f\n", c);
Prints:
0.000000
0.000000 # XX XX XX XX == 00 00 00 00 (остатки от предыдущего вызова)
Make sure that for formats known at compile time (for example, "%f\n"
), your compiler generates warnings for invalid types - UB should be avoided.
printf("%i", (int)msg); \\\ 107 и printf("%i", msg);
In both cases, msg
char is passed as int
(6B 00 00 00
) and with the same formats, the result should be the same (6B16 == 10710).
You can view the contents of variables on your machine using the C code:
#include <stdio.h>
static void print_memory(unsigned char *memory, size_t n)
{
for (unsigned char *p = memory; p != memory + n; ++p)
printf("%02X ", *p);
puts("");
}
int main(void)
{
char c = 'k';
print_memory((unsigned char *)&c, sizeof c);
int i = c;
print_memory((unsigned char *)&i, sizeof i);
float f = c;
print_memory((unsigned char *)&f, sizeof f);
double d = f;
print_memory((unsigned char *)&d, sizeof d);
}
Example:
$ gcc -std=c99 main.c -o main
$ ./main
6B
6B 00 00 00
00 00 D6 42
00 00 00 00 00 C0 5A 40
How a double in memory might look like
Example response, applied to 107.0
: a double-precision number in IEEE 754 format is represented as
d = ±sign · (1 + mantissa / 252) · 2order - 1023
The sign, mantissa, and order are packed in binary presented as:
00 00 00 00 00 C0 5A 40 // little-endian (8 byte as hex)
40 5A C0 00 00 00 00 00 // big-endian
0100000001011010110000000000000000000000000000000000000000000000 # 64-bit
^
|-самый левый бит знак=
0 (положительный)
^ ^
|---------|
Затем 11 бит порядок=
0b10000000101 (==1029)
^ ^
|--------------------------------------------------|
Оставшиеся 52 бита манитисса=
0b1010110000000000000000000000000000000000000000000000
= 3025855999639552
= 0xac00000000000
All together:
D = +(1 + 3025855999639552 / 252) * 2(1029 - 1023)
= 64 + 3025855999639552 / 70368744177664
= (4503599627370496 + 3025855999639552) / 70368744177664
= 7529455627010048 / 70368744177664
= 107.0
This demonstrates why 107.0
can be represented in memory as 00 00 00 00 00 C0 5A 40
.
Example of a stripped-down printf functions
#include <stdarg.h> // va_list, va_arg()
// emulate some printf() functionality using writec()
static void print(const char* format, ...)
{
va_list args;
va_start(args, format);
int infmt = 0, is_char = 0;
union
{
int i;
double f;
} arg;
char buffer[23] = {0}; // enough for %c, %d, %f
for (const char *p = format; *p; ++p) {
if (infmt) { // print arg
infmt = 0;
switch(*p) {
case '%': // "%%": print % literally
writec(*p);
break;
case 'c': // "%c": print char
is_char = 1;
// fall through
case 'd': // "%d": print int
case 'i': // "%i": print int
arg.i = va_arg(args, int); // load int arg
if (is_char) {
is_char = 0;
writec((unsigned char)arg.i); // format as char, write
} else {
itoa(arg.i, buffer, sizeof buffer); // format as int
for (char *pb = buffer; *pb; ++pb) writec(*pb); // write
}
break;
case 'f':
arg.f = va_arg(args, double); // load double arg
ftoa(arg.f, buffer, sizeof buffer); // format as floating point
for (char *pb = buffer; *pb; ++pb) writec(*pb); // write
break;
default:
arg.i = va_arg(args, int); // load int arg
scpy(buffer, "<unknown, load as int>");
for (char *pb = buffer; *pb; ++pb) writec(*pb); // write
};
} else if (*p != '%') { // print literally
writec(*p);
} else { // *p == '%'
infmt = 1;
}
}
va_end(args);
}
-
switch
is used to recognize transform descriptors (%d
) in the format string -
va_arg()
loads arguments of the desired type - the auxiliary functions
itoa()
andftoa()
format int and double, respectively -
writec()
writes one byte tostdout
.
This definition of print()
is sufficient for the code:
int main(void)
{
char msg = 'k';
print("%c %i %i %f %x\n", msg, msg, (int)msg, (float)msg, msg);
print("%c ", msg);
print("%i ", msg);
print("%i ", (int)msg);
print("%f\n", (float)msg);
print("%f\n", msg); // XXX UB
}
Example:
$ cc -std=c99 print-example.c -o print-example
$ ./print-example
k 107 107 107.000000 <unknown, load as int>
k 107 107 107.000000
107.000000
To compile, it is enough auxiliary functions define (definitions before print()
need to be inserted):
#include <unistd.h> // POSIX write()
static void writec(unsigned char c)
{
write(1, &c, 1);
}
static void scpy(char* dest, const char* src)
{
while (*dest++ = *src++);
}
static void ftoa(double d, char* buffer, int n)
{
if (d == 107) //XXX
scpy(buffer, "107.000000");
else
scpy(buffer, "XXX");
}
/// format positive int as decimal ascii digits
static char* utoa_rec(unsigned i, char* buffer, int *pn)
{
if (i >= 10)
buffer = utoa_rec(i / 10, buffer, pn);
if ((*pn)-- > 0)
*buffer++ = '0' + (i % 10);
return buffer;
}
static void itoa(int i, char* buffer, int n)
{
if (i < 0) {
i = -i; //XXX ignore INT_MIN
if (n-- > 0)
*buffer++ = '-'; // sign
}
buffer = utoa_rec(i, buffer, &n);
if (n > 0)
*buffer++ = '\0';
}
The function definitions are given so that an example can be run, but in fact they are just stubs (not for reuse), just to demonstrate one of the simplest print(format, ...)
implementations.
Here is a example of a full implementation of vfprintf()
from glibc.
Why is there no analog of cout
The functionality of cout
in C++ is critically tied to the mechanism of function overloading at the library level, i.e., in fact, the existence of a mechanism available to the library (and the user) function overloading, as well as the accompanying mechanism operator overloading. The correct version of the output function is selected by the overload mechanism based on the type analysis of the argument you specified.
In In C, there are no mechanisms for function overloading and operator overloading at the user or library level. Therefore, there are no such externally "type-independent" I / O operations.
The C11 version of the C language standard introduced the generic expression mechanism, which can be used to emulate user / library function overloading. This mechanism, for example, is used (can be used) in a standard header file <tgmath.h>
to implement "overloaded" mathematical functions-macros.
But there were no implementations of "type-independent" I / O functions in the standard library. If you want , you can use this new mechanism and try to implement them yourself.
I often write that the type in printf should be given. And why? Wouldn't printf lead the data itself to the modifier we specify?
printf
- this is the so-called variadic function. All arguments of this function, except the first one, are variadic arguments. They correspond to ...
in the function parameter list declaration
int printf( const char* format, ... );
Such arguments are passed through a special mechanism for passing parameters. Its peculiarity is that the function printf
itself knows absolutely nothing about the types of the arguments actually passed, and for this reason it cannot independently lead to the correct type.
With from a purely practical point of view, you can assume that all variadic arguments are written to a continuous binary stream. Within itself, the function printf
will read binary data from this stream using the va_list/va_start/va_arg
mechanism. What exactly is written to this stream, the function printf
itself cannot know. Therefore, it will parse this binary stream into parts in accordance with the format that you yourself passed to it from the outside. And if you "lie" to her in this format, then she herself does not suspect anything, will parse this binary stream incorrectly. For this reason, all the data that you put in this binary stream must exactly match the format specifiers that you specified in the format string.
Implementations are not required to implement the transfer of variadic arguments in this way, but this model illustrates the features of the transfer of variadic arguments quite accurately.
For
cout
to work, an overload of the<<
operator is required, and it is different for each type, so no modifiers are required. In C, operators are not overloaded.It seems to me that the word "will" is not quite accurate, I would say "interprets":
printf
will output the value of the variable as if it were of the type specified in the format string. The variable does not change the value. Theprintf
itself does not have the ability to determine whether the arguments match the format string at the execution stage (and the C compiler does not teach the programmer how to live and where to lay out a rake). Therefore, all sorts of surprises are possible if you accidentally make an error in the format string and confuse the types or number of arguments.