How to re-signify bytes without undefined behavior?

Details

In assembly, C, C++, C# with unsafe and other languages it is possible to reinterpret binary code in the address as of a different type from the original. Type convert int* to float* in C, which means that it points to integer value 0x3F800000 then there is also floating point 1.0f.

Although it allows algorithms that require refined bit control and even if you expect something obvious from a rereading, it is still considered U. B. (undefined behavior), i.e. undefined behavior, it is not known what compiler/interpreter will do with that.

If I'm not mistaken, almost always converting pointers is considered U. B. And I want to know why. Why do this gives B. O.? For example, reading float from integer after all does not give an expected result due to the known formatting that float has? What can come out different?

Here Visual Studio optimizes so well that disassembly even finds constants after conversions of the known values. As far as I know, at most the compiler can

  • Use any of several possible codes that represent the value (such as float NaN, which has several binary codes that represent it, then the compiler chooses any) and

  • Do not preserve the read and Write order when optimizing code in more complicated situations (such as arrays traversing indexes rather than working with simple variables).

Out that, I do not know and so I do not understand.

even for me in the compiler would avoid the second U. B. in an obvious way: being it implemented to maintain the order of read and write instructions than does not guarantee access to bytes at distinct addresses. In other words, if what the programmer expects is that order then it only changes if there is absolute certainty that the result will be the same. Still, I think this problem has already happened to me programming in VC++. So that that?

Questions

So the first question is Why is re-meaning values in memory U. B. So generalised? now the second question is How in case of need to do this we guarantee that it is not U. B. and the result is certainly the same? and of course, preferably without disabling optimizations.

To be clearer, if I want in C++ two functions that convert pointers (one to "read and write" and one to " read only") in a generic way with template, like this...

template< typename DstDataType , typename SrcDataType >
inline DstDataType* RemeanPtrAs( SrcDataType *srcPtr ){
    return (DstDataType*)srcPtr ;
}

template< typename DstDataType , typename SrcDataType >
inline const DstDataType* RemeanPtrAs( const SrcDataType *srcPtr ){
    return (const DstDataType*)srcPtr ;
}

Why is U. B. and how do I do exactly these functions with non-U. B. procedures that do exactly the same as you expect them to do?

Edit: is this code U. B.? Not being by itself, makes it possible to optimize the inline call? Is that the solution? Swapping typecasts for memcpy and memmove always avoids U. B.?

# include <string.h>

template< typename DstDataType , typename SrcDataType >
inline DstDataType* RemeanPtrAs( SrcDataType *srcPtr ){
    DstDataType* dstPtr ;
    memmove( &dstPtr , &srcPtr , sizeof(void*) ) ;
    return dstPtr ;
}

template< typename DstDataType , typename SrcDataType >
inline const DstDataType* RemeanPtrAs( const SrcDataType *srcPtr ){
    const DstDataType* dstPtr ;
    memmove( &dstPtr , &srcPtr , sizeof(const void*) ) ;
    return dstPtr ;
}
Author: RHER WOLF, 2020-12-03

1 answers

The question you exemplified through code in:

template< typename DstDataType , typename SrcDataType >
inline DstDataType* RemeanPtrAs( SrcDataType *srcPtr ){
    return (DstDataType*)srcPtr ;
}

template< typename DstDataType , typename SrcDataType >
inline const DstDataType* RemeanPtrAs( const SrcDataType *srcPtr ){
    return (const DstDataType*)srcPtr ;
}

Can be replaced by the use of interfaces that make the conversion from non-constant variable to read-only variable on access points, and also by the expression const_cast :

int i = 3;                              //variável inicial não-constante
const int* i_cptr = &i;                 //ponteiro constante
int* i_ptr = const_cast<int*>(i_cptr);  //casting de ponteiro constante para ponteiro não-constante
*i_ptr = 4;                             //mudança de valor da variável não-constante
std::cout << *i_ptr; //4

Removing or assigning Const/volatile status at runtime is trivial, but be aware that modifying the value of constant variables, even after using const_cast, is still UB. Nothing can change that. Const_cast is useful for accessing interfaces, but it transfers the responsibility of maintaining the constancy of constant values to you, the programmer.

Why is re-meaning values in memory U. B. So generalised?

The C++ ecosystem is not quite like this, in addition to several different types of pointers, there are other features that allow direct memory access. From std:: memcpy, reinterpret_cast, std:: bit_cast , and moves semantics up to atomic variables, where vc can select the memory access Order during multi-threading and optimize routines at the single instruction level in the processor.

How in case of need to do this we guarantee that it is not U. B. and the result is certainly the same?

The only way to ensure the correct behavior of any program is through in-depth study of the development environment. In in your case, it is necessary to seek greater understanding of the rules of C++ and understand the available tools. Programming in C++ is quite different from programming in C and these differences always exist for a reason. For example, when you refer to the lack of type punning with unions in C++, it happens due to the way C++ handles memory alignment and aliasing. To remain consistent with these rules, it is necessary that unions only have one type at a time, without keep two types at the same time.

Other features or situations are deliberately determined as UB in the language pattern. Both to make compiler (or STL) implementations more efficient, and to make other, more complex resources available.

Why is U. B. and how do I do exactly these functions with non-U. B. procedures that do exactly the same as is expected of them?

template <typename T>
T* to_non_const(const T* src) {
    return const_cast<T*>(src);
}

template <typename T>
const T* to_const(T* src) {
    return src;
}

Note that this implementation can be dangerous for unsuspecting users, since hiding the use of const_cast, it can be understood that the responsibilities attached to its use are no longer necessary. Both functions are redundant as they do nothing but mask the use of const_cast.

 0
Author: aviana, 2020-12-04 03:12:40