List size in Python and amount of RAM

When executing the following code

n = 10 ** 9
alist = [0] * n

The computer starts to work very slowly (probably due to lack of RAM?). If I understand correctly, in this case, the list is created and fits into memory, but it takes up so much of it that everything starts to "slow down". In this case, for example, if the list size is 10 ** 10, then MemoryError is immediately output. It turns out that Python can not fit a list of this size in RAM (it is physically not enough) or are there any restrictions on the size of the list?

I would like to understand why this happens in these two cases. I also wanted to know how best to protect myself from freezes due to lack of RAM (each time to estimate in advance how much memory the list will occupy, write some code where this case will be checked or something else?).

Update

I needed lists of this size when finding all primes smaller than n using the sieve of Eratosthenes. I started with n = 100 and gradually increased the size of the list. Up until 10 ** 8 everything worked, and then the unpleasant brakes started. Accordingly, I wanted to know how to protect myself in such cases. Maybe there are some recommendations for the maximum size of the list as a percentage of the amount of RAM, so that there are no such freezes?

Author: jfs, 2015-11-17

3 answers

[0] * n the list in CPython requires approximately n * <размер указателя> bytes (you can look at the function implementationslist_repeat(), PyList_New(n), PyMem_Malloc(), to see cases where MemoryError is immediately thrown without attempting to allocate memory -- on a 32-bit platform, these limits are quite low: n = 536870911 (((2**32 - 1) >> 1) // 4), on a 64-bit platform, it is unlikely that these conditions will work -- in other words: you can create lists as large as you have enough memory -- practically there are no restrictions.

The protection is simple: find out how much memory you have on your computer and do not try to allocate more than this amount.

For a specific task: you can reduce memory consumption by 4-8 times if you use array, numpy.array instead of a list. You can still reduce it if you first exclude small primes (wheel-optimizations), see Fastest way to list all primes below N. You can still use gmpy2 (up to 16 times less memory), see Speed up bitstring/bit operations in Python?

For n more than 10**9, it may make sense to use the Atkin sieve, which requires N 1/2+o (1) bits of memory, instead of the Eratosthenes sieve O (N1/2(log log N)/log N)).

There are algorithms without an upper bound if it is not known in advance how many primes are needed.

If you need higher numbers, then you can use algorithms that work segmentally (they allow you to use both the upper and lower ones). and specify the lower border), for example, pyprimesieve.

 8
Author: jfs, 2017-05-23 12:39:08

For a list of characters, the list size is more precisely specified in the question in short, it also depends on the bit depth of the machine. Moreover, the list requires more memory for its "maintenance" for each element than the stored data itself. For each element, 40 to 79 bytes plus the length of the element. So it's better to use array or numpy arrays.

 0
Author: Vasyl Kolomiets, 2018-01-16 08:06:11

You can try using a different algorithm for finding prime numbers. For example, http://python-dao.rpisarev.org.ua/2015/08/finding-prime-numbers-in-given-interval.html

 -1
Author: user206453, 2016-03-28 19:00:12