printf as a means of printing variables in C

I do not know how to formulate exactly what I want to ask, but it looks like this:

  1. How to print the contents of a variable in With:

    char msg = 'k';
    printf("%c", msg);
    
  2. How to print the contents of a variable in C++:

    char msg = 'k';
    cout << msg;
    

And now the questions:

  1. In C++, you do not need to specify a modifier for output, then why is there no analog in C (I understand that the language is much more complex)? older than C++, but still)?

  2. I often write that the type in printf should be given:

    char msg = 'k';
    printf("%f", (float)msg);
    

And why? In the analog C++, you don't have to do the same! Why can't I write like this: printf("%f", msg);? What is the reason? Wouldn't printf lead the data itself to the modifier we specify?

Author: jfs, 2017-02-17

3 answers

In C, printf(format, msg) in the source code calls the same function, regardless of the type, value of the msg variable. C - is a statically typed language, which means that at runtime the printf function does not know that msg is char, moreover, printf does not even know how many of its arguments were passed. Therefore, you have to manually set the desired representation in the format line: the value what type of from the memory in which msg lies, load and how to format this value so that the resulting bytes are then written to stdout .

In C++, cout << msg for different types msg can compile a call to different functions (operator overloads <<). You can easily define your own << operator, for example, to output vectors for debugging:

#include <iostream>
#include <vector>

template<class T>
std::ostream& operator << (std::ostream& os, const std::vector<T>& v) {
  for (auto&& x : v)
    os << x << ' ';
  return os;
}

int main()
{
  std::vector<int> v {1, 2, 3};
  std::cout << v << '\n';
}

Example:

$ g++ -std=c++11     main.cc   -o main
$ ./main
1 2 3 

Simplifying, we can imagine that the compiler for cout << msg generates a call to print_char(msg) if msg is char, print_float(msg), if msg is float, print_vector(msg), if msg is vector, etc. Here, each function knows what type it accepts, and for each type, a single byte representation is used by default, unless iomanip is used). For example, for int, its decimal digits are output (not hex or something else) by default.

In C11 appeared _Generic, which allows different functions depending on the type the argument (controlling-expression) is called, so you can define print_arg(arg) that would work for different types of arg (one argument).

Why can't I write like this: printf("%f", msg);? What is the reason? Wouldn't printf lead the data itself to the modifier we specify?
... printf("%f", (float)msg); \\ 107.000000, a printf("%f", msg); \\ 0.000000 why is the answer different?

When calling variable (variadic) functions such as printf, declared with an ellipsis (...) in parameters that can take a different number of arguments with expected types unknown to the compiler (Since the compiler is not required to understand the printf format string language and the value of format may be unknown at compile time), default conversions occur: char implicitly turns into int (integer promotion) or into unsigned int if the value of char is not representable in int on this platform (exotic). Similarly, float is converted to double when calling printf() - therefore, %lf is not required for double in printf(), but simply %f is used.

For the case with printf, you can use a simple model: the compiler puts the arguments (int, double, etc) in the memory area, and printf reads them from there according to the instructions in the format line-printf acts as a mini-computer: format sets the program and printf reads memory (va_arg), in which its arguments lie, formats them and writes the resulting bytes to stdout.

(double)msg the object may differ in memory from the (int)msg object. Therefore, printf can see different bit patterns in memory and, accordingly, the result of printf("%f", (int)msg) and printf("%f", (double)msg) can be different (the same instruction: %f is applied to different contents in memory).

An example for my machine (how the types look in memory may depend on the platform (operating system + processor) and compiler options). Bytes in memory I I will show it in the form of hexdump (for example: 6B16 == 10710).

For printf("%c", msg):

  • 'k' in C has the type int - 6B 00 00 00
  • char msg - 6B
  • in printf, this is passed as int - 6B 00 00 00
  • %c takes this argument 6B 00 00 00 and turns it into unsigned char (6B) and the corresponding byte (6B) is output to stdout. See the printf documentation for the c format.

For printf("%f", (float)msg):

  • msg (6B) converted to (float)msg - 00 00 D6 42
  • what is passed to printf as double - 00 00 00 00 00 C0 5A 40
  • %f interprets the memory 00 00 00 00 00 C0 5A 40 as floating-point number and outputs in a fixed format (6 decimal places): 107.000000 (the character for the dot may depend on the locale). The corresponding ascii-encoded bytes that are written in stdout are: 31 30 37 2e 30 30 30 30 30 30

For printf("%f", msg):

  • msg (char - 6B) is passed to printf as int - 6B 00 00 00
  • %f interprets the memory 6B 00 00 00 XX XX XX XX as a floating-point number 5.3e-322 (if XX == 00) and outputs 0.000000

Since the behavior is undefined (undefined behavior - UB) when the format is incorrect, printf("%f", msg) can do anything, even launch rockets. On my machine, the result printf("%f", msg) depends on the previous code to for example:

char c = 'k';
printf("%f\n", (float)c);
printf("%f\n", c);

Prints:

107.000000
107.000000 # XX XX XX XX == 00 C0 5A 40 (остатки от предыдущего вызова)

But:

printf("%f\n", 5.3e-322);
printf("%f\n", c);

Prints:

0.000000
0.000000  # XX XX XX XX == 00 00 00 00 (остатки от предыдущего вызова)

Make sure that for formats known at compile time (for example, "%f\n"), your compiler generates warnings for invalid types - UB should be avoided.

printf("%i", (int)msg); \\\ 107 и printf("%i", msg);

In both cases, msg char is passed as int (6B 00 00 00) and with the same formats, the result should be the same (6B16 == 10710).

You can view the contents of variables on your machine using the C code:

#include <stdio.h>

static void print_memory(unsigned char *memory, size_t n)
{
  for (unsigned char *p = memory; p != memory + n; ++p)
    printf("%02X ", *p);
  puts("");
}


int main(void)
{
  char c = 'k';
  print_memory((unsigned char *)&c, sizeof c);
  int i = c;
  print_memory((unsigned char *)&i, sizeof i);
  float f = c;
  print_memory((unsigned char *)&f, sizeof f);
  double d = f;
  print_memory((unsigned char *)&d, sizeof d);
}

Example:

$ gcc -std=c99     main.c   -o main
$ ./main
6B 
6B 00 00 00 
00 00 D6 42 
00 00 00 00 00 C0 5A 40 

How a double in memory might look like

Example response, applied to 107.0: a double-precision number in IEEE 754 format is represented as
d = ±sign · (1 + mantissa / 252) · 2order - 1023

The sign, mantissa, and order are packed in binary presented as:

 00 00 00 00 00 C0 5A 40 // little-endian (8 byte as hex)
 40 5A C0 00 00 00 00 00 // big-endian
 0100000001011010110000000000000000000000000000000000000000000000 # 64-bit
 ^
 |-самый левый бит знак=
 0 (положительный)

  ^         ^
  |---------|
Затем 11 бит порядок=
0b10000000101 (==1029)
             ^                                                  ^
             |--------------------------------------------------|
Оставшиеся 52 бита манитисса=
           0b1010110000000000000000000000000000000000000000000000
  = 3025855999639552
  = 0xac00000000000

All together:

D = +(1 + 3025855999639552 / 252) * 2(1029 - 1023)
= 64 + 3025855999639552 / 70368744177664
= (4503599627370496 + 3025855999639552) / 70368744177664
= 7529455627010048 / 70368744177664
= 107.0

This demonstrates why 107.0 can be represented in memory as 00 00 00 00 00 C0 5A 40.

Example of a stripped-down printf functions

#include <stdarg.h> // va_list, va_arg()

// emulate some printf() functionality using writec()
static void print(const char* format, ...)
{
  va_list args;
  va_start(args, format);
  int infmt = 0, is_char = 0;
  union
  {
    int i;
    double f;
  } arg;
  char buffer[23] = {0}; // enough for %c, %d, %f

  for (const char *p = format; *p; ++p) {
    if (infmt) { // print arg
      infmt = 0;
      switch(*p) {
      case '%': // "%%": print % literally
        writec(*p);
      break;
      case 'c': // "%c": print char
        is_char = 1;
        // fall through
      case 'd': // "%d": print int
      case 'i': // "%i": print int
        arg.i = va_arg(args, int); // load int arg
        if (is_char) {
          is_char = 0;
          writec((unsigned char)arg.i); // format as char, write
        } else {
          itoa(arg.i, buffer, sizeof buffer); // format as int
          for (char *pb = buffer; *pb; ++pb) writec(*pb); // write
        }
        break;
      case 'f':
        arg.f = va_arg(args, double); // load double arg
        ftoa(arg.f, buffer, sizeof buffer); // format as floating point
        for (char *pb = buffer; *pb; ++pb) writec(*pb); // write
        break;
      default:
        arg.i = va_arg(args, int); // load int arg
        scpy(buffer, "<unknown, load as int>");
        for (char *pb = buffer; *pb; ++pb) writec(*pb); // write
      };
    } else if (*p != '%') { // print literally
      writec(*p);
    } else { // *p == '%'
      infmt = 1;
    }
  }
  va_end(args);
}
  • switch is used to recognize transform descriptors (%d) in the format string
  • va_arg() loads arguments of the desired type
  • the auxiliary functions itoa() and ftoa() format int and double, respectively
  • writec() writes one byte to stdout.

This definition of print() is sufficient for the code:

int main(void)
{
  char msg = 'k';
  print("%c %i %i %f %x\n", msg, msg, (int)msg, (float)msg, msg);
  print("%c ", msg);
  print("%i ", msg);
  print("%i ", (int)msg);
  print("%f\n", (float)msg);
  print("%f\n", msg); // XXX UB
}

Example:

$ cc -std=c99 print-example.c -o print-example
$ ./print-example
k 107 107 107.000000 <unknown, load as int>
k 107 107 107.000000
107.000000

To compile, it is enough auxiliary functions define (definitions before print() need to be inserted):

#include <unistd.h> // POSIX write()

static void writec(unsigned char c)
{
  write(1, &c, 1);
}

static void scpy(char* dest, const char* src)
{
  while (*dest++ = *src++);
}

static void ftoa(double d, char* buffer, int n)
{
  if (d == 107) //XXX
    scpy(buffer, "107.000000");
  else
    scpy(buffer, "XXX");
}

/// format positive int as decimal ascii digits
static char* utoa_rec(unsigned i, char* buffer, int *pn)
{
  if (i >= 10)
    buffer = utoa_rec(i / 10, buffer, pn);
  if ((*pn)-- > 0)
    *buffer++ = '0' + (i % 10);
  return buffer;
}

static void itoa(int i, char* buffer, int n)
{
  if (i < 0) {
    i = -i; //XXX ignore INT_MIN
    if (n-- > 0)
      *buffer++ = '-'; // sign
  }
  buffer = utoa_rec(i, buffer, &n);
  if (n > 0)
    *buffer++ = '\0';
}

The function definitions are given so that an example can be run, but in fact they are just stubs (not for reuse), just to demonstrate one of the simplest print(format, ...) implementations.

Here is a example of a full implementation of vfprintf() from glibc.

 33
Author: jfs, 2017-04-13 12:53:29

Why is there no analog of cout

The functionality of cout in C++ is critically tied to the mechanism of function overloading at the library level, i.e., in fact, the existence of a mechanism available to the library (and the user) function overloading, as well as the accompanying mechanism operator overloading. The correct version of the output function is selected by the overload mechanism based on the type analysis of the argument you specified.

In In C, there are no mechanisms for function overloading and operator overloading at the user or library level. Therefore, there are no such externally "type-independent" I / O operations.

The C11 version of the C language standard introduced the generic expression mechanism, which can be used to emulate user / library function overloading. This mechanism, for example, is used (can be used) in a standard header file <tgmath.h> to implement "overloaded" mathematical functions-macros.

But there were no implementations of "type-independent" I / O functions in the standard library. If you want , you can use this new mechanism and try to implement them yourself.

I often write that the type in printf should be given. And why? Wouldn't printf lead the data itself to the modifier we specify?

printf - this is the so-called variadic function. All arguments of this function, except the first one, are variadic arguments. They correspond to ... in the function parameter list declaration

int printf( const char* format, ... );

Such arguments are passed through a special mechanism for passing parameters. Its peculiarity is that the function printf itself knows absolutely nothing about the types of the arguments actually passed, and for this reason it cannot independently lead to the correct type.

With from a purely practical point of view, you can assume that all variadic arguments are written to a continuous binary stream. Within itself, the function printf will read binary data from this stream using the va_list/va_start/va_arg mechanism. What exactly is written to this stream, the function printf itself cannot know. Therefore, it will parse this binary stream into parts in accordance with the format that you yourself passed to it from the outside. And if you "lie" to her in this format, then she herself does not suspect anything, will parse this binary stream incorrectly. For this reason, all the data that you put in this binary stream must exactly match the format specifiers that you specified in the format string.

Implementations are not required to implement the transfer of variadic arguments in this way, but this model illustrates the features of the transfer of variadic arguments quite accurately.

 18
Author: AnT, 2017-02-18 20:07:18
  1. For cout to work, an overload of the << operator is required, and it is different for each type, so no modifiers are required. In C, operators are not overloaded.

  2. It seems to me that the word "will" is not quite accurate, I would say "interprets": printf will output the value of the variable as if it were of the type specified in the format string. The variable does not change the value. The printf itself does not have the ability to determine whether the arguments match the format string at the execution stage (and the C compiler does not teach the programmer how to live and where to lay out a rake). Therefore, all sorts of surprises are possible if you accidentally make an error in the format string and confuse the types or number of arguments.

 6
Author: Ильдар Хайруллин, 2017-11-26 20:57:43