Encoding text to and from arbitrary binary code. Example: "A" < - > " 01100011"

I need to convert a string, for example "А", to a string of binary code, for example "01100011", then process a bit of binary code, passing it through the "Exclusive OR" function with another binary code, for example "01100011" with "11000010", then "10100001" will come out, and back translate the resulting code into a readable string, for example in "Т".

At the same time, I want to use the Russian, English and Ukrainian layout and numbers, with punctuation marks. Can you tell me how to do this in Python 3?

The question is how in Python to convert a string to binary code and back using any encoding algorithm?

Author: MarianD, 2017-09-07

2 answers

To turn any text into " 01 " - strings (bits):

def text_to_bits(text, encoding='utf-8', errors='surrogatepass'):
    bits = bin(int.from_bytes(text.encode(encoding, errors), 'big'))[2:]
    return bits.zfill(8 * ((len(bits) + 7) // 8))

And back again:

def text_from_bits(bits, encoding='utf-8', errors='surrogatepass'):
    n = int(bits, 2)
    return n.to_bytes((n.bit_length() + 7) // 8, 'big').decode(encoding, errors) or '\0'

Example:

>>> text_to_bits("мiр")))
'1101000010111100011010011101000110000000'
>>> text_from_bits(_)
мiр

See Convert binary to ASCII and vice versa.


To perform XOR on two strings, there is no need to express the bits as ascii strings:

from itertools import cycle

def xor(message, key): 
    return bytes(a^b for a, b in zip(message, cycle(key)))

It is enough to encode the text into bytes using the .encode() method:

>>> key = b'key'
>>> xor('hello world'.encode(), key)
b'\x03\x00\x15\x07\nY\x1c\n\x0b\x07\x01'
>>> xor(_, key).decode() # и обратно
'hello world'

See Translate from PHP to python: XOR with a key for the string.

For large ones strings, you can use numpy:

import numpy as np   # pip install numpy 

def slow_xor(aa, bb):
    a = np.frombuffer(aa, dtype=np.byte)
    b = np.frombuffer(bb, dtype=np.byte)
    return np.bitwise_xor(a, b).tostring()

Example:

>>> slow_xor(b'hello...', 'миръ'.encode())
b'\xb8\xd9\xbc\xd4\xbe\xae\xff\xa4'

If you want to speed up this feature, see Simple Python Challenge: Fastest Bitwise XOR on Data Buffers.

 8
Author: jfs, 2017-12-17 18:15:09

A solution that supports Unicode Basic Multilingual Plane (BMP) AKA Plane Unicode characters:

s = 'отака собі тестова - String !'

def encode(s):
    return list(map(lambda x: "{0:b}".format(ord(x)).zfill(16), s))

def decode(lst):
    return ''.join(map(lambda x: chr(int(x,2)), lst))

Test:

In [10]: lst = encode(s)

In [11]: lst
Out[11]:
['0000010000111110',
 '0000010001000010',
 '0000010000110000',
 '0000010000111010',
 '0000010000110000',
 '0000000000100000',
 '0000010001000001',
 '0000010000111110',
 '0000010000110001',
 '0000010001010110',
 '0000000000100000',
 '0000010001000010',
 '0000010000110101',
 '0000010001000001',
 '0000010001000010',
 '0000010000111110',
 '0000010000110010',
 '0000010000110000',
 '0000000000100000',
 '0000000000101101',
 '0000000000100000',
 '0000000001010011',
 '0000000001110100',
 '0000000001110010',
 '0000000001101001',
 '0000000001101110',
 '0000000001100111',
 '0000000000100000',
 '0000000000100001']

In [12]: decode(lst)
Out[12]: 'отака собі тестова - String !'
 5
Author: MaxU, 2017-09-08 09:49:11