代码收藏家技术教程 2022-08-02

【pandas小技巧】删除某列中包含nan的数据

删除某列中包含nan的数据

最近用pandas比较频繁，需要删除指定的某列中有nan的整个行数据

爬虫爬下来的数据，有时候会有缺失，所以需要删除掉这种空数据，wps里面是挺好筛选的

文章目录

删除某列中包含nan的数据

pandas里面的`dropna`

1. 删除所有包含NaN的行，相当于参数全部默认

2. 删除所有包含NaN的列

3. 删除一整列都是NaN的列

4. 保留至少有4个非nan值的列

5.删除列索引0,2中包含nan的行，字符串要加引号

一开始的思路是通过loc确定是
nan的行

company_data = company_data.loc[lambda x: company_data['industry'] == np.nan]

发现这样根本筛不出来

import pandas as pd
import numpy as numpy

print(np.nan == np.nan)
// False

print(np.nan is np.nan)
// True

这个是个什么神奇等式，暂时没搞明白

发现loc用起来比较费劲
下面介绍下dropna

pandas里面的`dropna`

官网地址:https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.dropna.html?highlight=dropna#pandas.DataFrame.dropna

DataFrame.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)
参数：
axis: {0 or ‘index’, 1 or ‘columns’}, default 0
确定是否删除了包含缺失值的行或列。

0 或index：删除包含缺失值的行。

1或columns：删除包含缺失值的列。

how: {‘any’, ‘all’}, default ‘any’
当我们至少有一个 NA 或全部 NA 时，确定是否从 DataFrame 中删除行或列。

any：如果存在任何 NA 值，则删除该行或列。

all：如果所有值都是 NA，则删除该行或列。

thresh: int, optional
需要许多非 NA 值。(保留含有int个非nan值的行)

subset: column label or sequence of labels, optional
沿其他轴考虑的标签，例如如果您要删除行，这些将是要包含的列列表。(删除特定列中包含缺失值的行或列)

inplace:默认False，即筛选后的数据存为副本,True表示直接在原数据上更改

数据示例：

import pandas as pd
df = pd.DataFrame(np.arange(24).reshape(4,6), index=['a', 'c', 'e', 'f'])
df.iloc[0,[1,2,5]]=np.nan
df.iloc[2,[1,4]]=np.nan

数据展示

df
    0     1     2   3     4     5
a   0   NaN   NaN   3   4.0   NaN
c   6   7.0   8.0   9  10.0  11.0
e  12   NaN  14.0  15   NaN  17.0
f  18  19.0  20.0  21  22.0  23.0

1. 删除所有包含NaN的行，相当于参数全部默认

a = df.dropna()
a
    0     1     2   3     4     5
c   6   7.0   8.0   9  10.0  11.0
f  18  19.0  20.0  21  22.0  23.0

2. 删除所有包含NaN的列

a = df.dropna(axis=1)
a
    0   3
a   0   3
c   6   9
e  12  15
f  18  21

3. 删除一整列都是NaN的列

增加一列nan数据

df.iloc[:,1]=np.nan
df
    0   1     2   3     4     5
a   0 NaN   NaN   3   4.0   NaN
c   6 NaN   8.0   9  10.0  11.0
e  12 NaN  14.0  15   NaN  17.0
f  18 NaN  20.0  21  22.0  23.0

数据删除操作

df=df.dropna(axis=1,how='all')
df
    0     2   3     4     5
a   0   NaN   3   4.0   NaN
c   6   8.0   9  10.0  11.0
e  12  14.0  15   NaN  17.0
f  18  20.0  21  22.0  23.0

4. 保留至少有4个非nan值的列

a = df.dropna(axis=1, thresh=4)
a
    0   3
a   0   3
c   6   9
e  12  15
f  18  21

5.删除列索引0,2中包含nan的行，字符串要加引号

a =df.dropna(subset=[0, 2]) 
a
    0     2   3     4     5
c   6   8.0   9  10.0  11.0
e  12  14.0  15   NaN  17.0
f  18  20.0  21  22.0  23.0

来源：myt2000

Python

物联沃分享整理
物联沃-IOTWORD物联网 » 【pandas小技巧】删除某列中包含nan的数据

代码收藏家普通

分享到：

删除某列中包含nan的数据

文章目录

pandas里面的dropna

1. 删除所有包含NaN的行，相当于参数全部默认

2. 删除所有包含NaN的列

3. 删除一整列都是NaN的列

4. 保留至少有4个非nan值的列

5.删除列索引0,2中包含nan的行，字符串要加引号

代码收藏家 普通

相关推荐

发表回复 取消回复

pandas里面的`dropna`

代码收藏家普通

发表回复取消回复