How do I delete duplicate rows in an array?

Question

How do I delete duplicate rows in an array?

Given a NumPy array:

[[1 0 1 0 0 1]
 [1 1 1 0 0 1]
 [1 0 1 0 0 1]
 [1 0 1 0 1 1]
 [1 0 1 0 0 1]]

a = [[1, 0, 1, 0, 0, 1],
     [1, 1, 1, 0, 0, 1],
     [1, 0, 1, 0, 0, 1],
     [1, 0, 1, 0, 1, 1],
     [1, 0, 1, 0, 0, 1]]

 b = np.array(a)

How can I delete duplicate rows? That is, the output should be:

[[1 0 1 0 0 1]
 [1 1 1 0 0 1]
 [1 0 1 0 1 1]]

2

массивы python pandas numpy

Author: 0xdb, 2020-03-27

Source

1 answers

score 4 · Accepted Answer

Use np. unique(arr, axis=0):

In [23]: np.unique(b, axis=0)
Out[23]:
array([[1, 0, 1, 0, 0, 1],
       [1, 0, 1, 0, 1, 1],
       [1, 1, 1, 0, 0, 1]])

If you need to keep the order of the rows:

u, idx = np.unique(b, axis=0, return_index=True)
res = u[idx.argsort()]

Result:

In [33]: res
Out[33]:
array([[1, 0, 1, 0, 0, 1],
       [1, 1, 1, 0, 0, 1],
       [1, 0, 1, 0, 1, 1]])