How to delete files and folders recursively safely with Python?

Guys, I'm trying to create a script to delete my files and folders safely, just like shred does, srm , etc...but I would like to do this with Python script. I managed to find a function on the internet that writes random values to the file before deleting it. I know there is nothing unrecoverable, but as a didactic I would like to implement such a script.

I have this function:

def secure_delete(file_, steps=3):
    import os
    with open(file_, "ba+", buffering=0) as f:
        data = f.tell()
    f.close()
    with open(file_, "br+", buffering=0) as f:
        for i in range(steps):
            f.seek(0, 0)
            f.write(os.urandom(data))
        f.seek(0)
        for _ in range(data):
            f.write(b'\x00')
    os.remove(file_)

Passing the file as argument, I can do the action, but I would like to do this recursively in a directory for all files and not by opening a specific file or passing it as an argument.

Anyone have any ideas?

Author: jsbueno, 2019-11-13

4 answers

Well, taking this example you can do like this:


import os
import shutil
import uuid

def recursive_listing(path):
    files = []

    # r = root, d = directories, f = files
    for r, d, f in os.walk(path):
        for file in f:
            files.append(os.path.join(r, file))
        for dirs in d:
            files.append(os.path.join(r, dirs))

    list_ = [file for file in files]
    return list_


def secure_delete_recursive(path, steps=5):

    objects = recursive_listing(path)

    for obj in objects:
        # Para arquivos (gravando, renomeando e deletando)
        if os.path.isfile(obj):
            try:

                with open(obj, "ba+", buffering=0) as f:
                    data = f.tell()
                f.close()

                with open(obj, "br+", buffering=0) as f:
                    for i in range(steps):
                        f.seek(0, 0)
                        f.write(os.urandom(data))
                    f.seek(0)
                    for _ in range(data):
                        f.write(b'\x00')

                name = str(uuid.uuid4())
                new_file_rename = os.path.join(os.path.split(obj)[0], name)
                os.rename(obj, new_file_rename)
                # Descomente a linha abaixo para deletar os arquivos recursivamente.
                # os.remove(new_file_rename)

            except PermissionError as p:
                print(p)

    for obj in objects:
        # Para diretórios (renomeando e deletando)
        if os.path.isdir(obj):
            try:

                name = str(uuid.uuid4())
                new_file_rename = os.path.join(os.path.split(obj)[0], name)
                os.rename(obj, new_file_rename)
                # Descomente a linha abaixo para deletar as pastas recursivamente.
                # shutil.rmtree(new_file_rename, ignore_errors=False, onerror=None)

            except PermissionError as p:
                print(p)


if __name__ == '__main__':
    secure_delete_recursive('/tmp')

You can also use the Library Cryptography with that same idea of this example above, it is even safer than writing binaries or, you can use the two forms together that reform further. I hope it helped.

note: If you don't have permission on files and folders, it won't work for certain objects.

 2
Author: WilliamCanin, 2019-11-14 08:50:41

Writing data on top of a file's data is not "safe" - and the reason is that it depends on the filesystem (F. S.) layer of the operating system deciding what to do when you open a file for writing - and if you go to see, none of them, for various reasons, will write the data to the same physical position of the

The idea that by opening a file for reading and writing, you can modify a single byte of the file, close, and read again and having all the original data with that 1 different byte is convenient for high-level programs, but it's just an abstraction of the operating system.

In practice, because disk data access has evolved historically, the system can only write blocks of at least 512 bytes - but more likely 4Kbytes (4096) at once - what is done when you modify a single byte is: - the lowest level S. O. layers read 4KB from disk (even if the file is smaller, any dirt that is on the disk after the arqiuvo is read to memory) - the highest level layer of the S. O. change the desired byte, within that block - the 4KB are placed back on the disk, but not in the same position - if everything went well so far, the F. S. layer of the operating system changes the metadata of the file to read the 4KB of the new position, not the old one. (mainly in modern F. S. It has a mencanism of journaling - which precisely allows the old version of the if any failure happens before the end of the whole process).

Any tool that has access to the raw data on the disk partition can then access the data from "overwritten" files. It can do a good job to reconstruct these files from the pieces found - and many parts may be physically overwritten, by the very process of re-writing the files, but it is more to "chance" than deliberate destruction.

If you estiver on a Linux operating system, the raw partition data is accessible simply by opening the special devices at /dev/ as if they were normal files. (In this case, reading and writing to these device files, yes , causes the bytes to be read and written to the same physical positions as the disk - precisely because the filesystem code used by the kernel to access the physical devices as files does so)

So if you have any special libraries, that you can understand the data structure of the particular filesystem that you intend to change, it is possible, with access to the "raw" device, to over-write the exact bytes, irreversibly, of the files, as you want.

The job of doing this correctly is at least an order of magnitude more complex than just using the user layer of opening and recording files, as you want to do. (some filesystems are so complex, that dozens of the best it took more than 10 years to access your data directly correctly in a parallel implementation-see the history of NTFS drivers on Linux, for example, or ZFS). But if it's a FAT, or fat 32 - still used on some USB drives, it's more or less quiet to redo access, and it can be a pretty fun project. (But on USB drives, I do not know if the low-level layer, firmware of the device, does not remap the blocks - that is, for any access in software, yes, the data they are "overwritten"-but it may be that physically they are still there - on SSD disks, for sure this happens)

I believe that as for over - writing the files recursively, the other answers here already figure it out-so I'm not going to add yet another example of how to do it.

 3
Author: jsbueno, 2019-11-15 16:24:35

You can read the files from the current directory using os.listdir('.') or indicate the directory you want to read.

Another way is using glob.globe('*.dat') where the answer would be a list of files, in this case of the example, with extension ".dat".

The function input could be the list of files and treat them all with a " for "or it could make a" for " and use the function for each file.

If you want to view subdirectory files you can consider using the function "os.walk".

 2
Author: Hugo Salvador, 2019-11-13 12:16:59

I have a script that deletes all files from Windows cache folders, I think this can help you.

import os 
import win32con, win32api

list = ['C:\Windows\Prefetch','C:\Windows\Temp']

def clear_data(locate):
    for raiz, diretorios, arquivos in os.walk(locate):
        for arquivo in arquivos:
            try:
                print(arquivo)
                win32api.SetFileAttributes(os.path.join(raiz,arquivo), win32con.FILE_ATTRIBUTE_NORMAL)
                os.remove(os.path.join(raiz,arquivo))
            except:
                print(arquivo+' Erro')


temp = os.getenv('temp')
temp = temp.replace('Roaming','\Local\Temp')
list.append(temp)


for i in list:
    clear_data(i)

exit()
 1
Author: Willian Jesus Da Silva, 2019-11-13 13:14:26