Python酷库之旅-第三方库Pandas(022)

CSDN 2024-07-21 14:35:01 阅读 77

一、用法精讲

55、pandas.lreshape函数

55-1、语法

55-2、参数

55-3、功能

55-4、返回值

55-5、说明

55-6、用法

55-6-1、数据准备

55-6-2、代码示例

55-6-3、结果输出

56、pandas.wide_to_long函数

56-1、语法

56-2、参数

56-3、功能

56-4、返回值

56-5、说明

56-6、用法

56-6-1、数据准备

56-6-2、代码示例

56-6-3、结果输出

57、pandas.isna函数

57-1、语法

57-2、参数

57-3、功能

57-4、返回值

57-5、说明

57-6、用法

57-6-1、数据准备

57-6-2、代码示例

57-6-3、结果输出

二、推荐阅读

1、Python筑基之旅

2、Python函数之旅

3、Python算法之旅

4、Python魔法之旅

5、博客个人主页

一、用法精讲

55、pandas.lreshape函数

55-1、语法

<code># 55、pandas.lreshape函数

pandas.lreshape(data, groups, dropna=True)

Reshape wide-format data to long. Generalized inverse of DataFrame.pivot.

Accepts a dictionary, groups, in which each key is a new column name and each value is a list of old column names that will be “melted” under the new column name as part of the reshape.

Parameters:

data

DataFrame

The wide-format DataFrame.

groups

dict

{new_name : list_of_columns}.

dropna

bool, default True

Do not include columns whose entries are all NaN.

Returns:

DataFrame

Reshaped DataFrame.

55-2、参数

55-2-1、data(必须)：要进行重塑的Pandas数据框。

55-2-2、groups(必须)：一个字典，用于指定要重塑的列组，字典的键是新列的名称，值是要重塑的列列表，例如：{'A': ['A1', 'A2'], 'B': ['B1', 'B2']}。

55-2-3、dropna(可选，默认值为True)：指定是否在重塑过程中丢弃包含NaN的行，如果设置为True，则会丢弃包含NaN的行；如果设置为False，则保留NaN。

55-3、功能

根据指定的列组对数据框进行重塑，将宽格式的数据转换为长格式。

55-4、返回值

返回一个重塑后的Pandas数据框，其中包含从宽格式转换为长格式的数据。

55-5、说明

Pandas.Ireshape是一个强大的工具，可以根据指定的列组对数据框进行宽转长的重塑，它具有三个参数：

55-5-1、data：要重塑的DataFrame。

55-5-2、groups：一个字典，定义新的列组。

55-5-3、dropna：指定是否丢弃包含NaN的行。

通过理解和正确使用这些参数，可以灵活地重塑数据框，从而更好地组织和分析数据。

55-6、用法

55-6-1、数据准备

无

55-6-2、代码示例

# 55、pandas.lreshape函数

import pandas as pd

# 创建示例数据框

data = pd.DataFrame({

'A1': [1, 2, 3],

'A2': [4, 5, 6],

'B1': ['a', 'b', 'c'],

'B2': ['d', 'e', 'f']

})

# 定义列组

groups = {

'A': ['A1', 'A2'],

'B': ['B1', 'B2']

}

# 使用pandas.lreshape进行重塑

reshaped = pd.lreshape(data, groups)

print(reshaped)

55-6-3、结果输出

# 55、pandas.lreshape函数

# A B

# 0 1 a

# 1 2 b

# 2 3 c

# 3 4 d

# 4 5 e

# 5 6 f

56、pandas.wide_to_long函数

56-1、语法

# 56、pandas.wide_to_long函数

pandas.wide_to_long(df, stubnames, i, j, sep='', suffix='\\d+')code>

Unpivot a DataFrame from wide to long format.

Less flexible but more user-friendly than melt.

With stubnames [‘A’, ‘B’], this function expects to find one or more group of columns with format A-suffix1, A-suffix2,…, B-suffix1, B-suffix2,… You specify what you want to call this suffix in the resulting long format with j (for example j=’year’)

Each row of these wide variables are assumed to be uniquely identified by i (can be a single column name or a list of column names)

All remaining variables in the data frame are left intact.

Parameters:

df

DataFrame

The wide-format DataFrame.

stubnames

str or list-like

The stub name(s). The wide format variables are assumed to start with the stub names.

i

str or list-like

Column(s) to use as id variable(s).

j

str

The name of the sub-observation variable. What you wish to name your suffix in the long format.

sep

str, default “”

A character indicating the separation of the variable names in the wide format, to be stripped from the names in the long format. For example, if your column names are A-suffix1, A-suffix2, you can strip the hyphen by specifying sep=’-’.

suffix

str, default ‘\d+’

A regular expression capturing the wanted suffixes. ‘\d+’ captures numeric suffixes. Suffixes with no numbers could be specified with the negated character class ‘\D+’. You can also further disambiguate suffixes, for example, if your wide variables are of the form A-one, B-two,.., and you have an unrelated column A-rating, you can ignore the last one by specifying suffix=’(!?one|two)’. When all suffixes are numeric, they are cast to int64/float64.

Returns:

DataFrame

A DataFrame that contains each stub name as a variable, with new index (i, j).

`56-2、参数`

 56-2-1、df(必须)：要进行重塑的Pandas数据框。
 
56-2-2、stubnames(必须)：列名前缀的列表，这些列将被转换为长格式。比如，如果列名是A1970, A1980, B1970, B1980，那么stubnames应该是['A', 'B']。
 
56-2-3、i(必须)：表示唯一标识每一行的列名或列名列表，重塑后的每一行将保留这些列。
 
56-2-4、j(必须)：新列的名称，这列将包含从宽格式中提取的时间或编号信息。例如，'year'可以作为j。
 
56-2-5、sep(可选，默认值为'')：列名中stubnames和j部分之间的分隔符。例如，如果列名是A-1970，那么sep应该是'-'。
 
56-2-6、suffix(可选，默认值为'\\d+')：stubnames后缀的正则表达式模式，用于匹配列名中的时间或编号部分，默认情况下，匹配一个或多个数字。
 
56-3、功能
         用于将数据从宽格式(wide format)转换为长格式(long format)，这个函数特别适用于处理时间序列数据或面板数据。
 
56-4、返回值
         返回一个重塑后的Pandas数据框，其中包含从宽格式转换为长格式的数据。
 
56-5、说明
         Pandas.wide_to_long是一个强大的工具，可以通过指定列名前缀、标识列、时间或编号列来将宽格式的数据转换为长格式。它具有以下参数：
 
56-5-1、df：要重塑的DataFrame。
 
56-5-2、stubnames：列名前缀列表。
 
56-5-3、i：唯一标识每一行的列名或列名列表。
 
56-5-4、j：新列的名称，用于包含时间或编号信息。
 
56-5-5、sep：列名中前缀和时间/编号部分之间的分隔符。
 
56-5-6、suffix：匹配时间或编号部分的正则表达式模式。
 
        理解并正确使用这些参数，可以灵活地重塑数据框，以便更好地进行数据分析和处理。
 
56-6、用法
 56-6-1、数据准备
 无 
56-6-2、代码示例
 # 56、pandas.wide_to_long函数
import pandas as pd
# 创建示例数据框
df = pd.DataFrame({
 'id': [1, 2, 3],
 'A1970': [2.5, 1.5, 3.0],
 'A1980': [2.0, 1.0, 3.5],
 'B1970': [3.0, 2.0, 4.0],
 'B1980': [3.5, 2.5, 4.5]
})
# 使用pandas.wide_to_long进行重塑
df_long = pd.wide_to_long(df, stubnames=['A', 'B'], i='id', j='year', sep='', suffix='\\d+')code>
print(df_long) 
56-6-3、结果输出 
 # 56、pandas.wide_to_long函数
# A B
# id year 
# 1 1970 2.5 3.0
# 2 1970 1.5 2.0
# 3 1970 3.0 4.0
# 1 1980 2.0 3.5
# 2 1980 1.0 2.5
# 3 1980 3.5 4.5 
57、pandas.isna函数
 57-1、语法
 # 57、pandas.isna函数
pandas.isna(obj)
Detect missing values for an array-like object.
This function takes a scalar or array-like object and indicates whether values are missing (NaN in numeric arrays, None or NaN in object arrays, NaT in datetimelike).
Parameters:
obj
scalar or array-like
Object to check for null or missing values.
Returns:
bool or array-like of bool
For scalar input, returns a scalar boolean. For array input, returns an array of boolean indicating whether each corresponding element is missing. 
57-2、参数
 57-2-1、obj(必须)：要检查缺失值的对象，可以是单个标量值、数组、Series、DataFrame或其他类似的pandas对象。
 
57-3、功能
         用于检测缺失值，NA值在Pandas中表示缺失或无效的数据。
 
57-4、返回值
         返回一个布尔型对象，与输入obj具有相同的形状，布尔型对象中的True表示对应位置的值为缺失值，False表示不是缺失值。
 
57-5、说明
         pandas.isna(obj)是一个非常实用的函数，用于检测任何pandas对象中的缺失值，它适用于单个值、数组、Series和DataFrame等多种类型的数据结构，返回的布尔型对象可以用于进一步的数据清洗和处理操作。
 
57-6、用法
 57-6-1、数据准备
 无 
57-6-2、代码示例
 # 57、pandas.isna函数
# 57-1、检查单个标量值
import pandas as pd
import numpy as np
print(pd.isna(np.nan))
print(pd.isna(3.14))
print(pd.isna(None), end='\n\n')code>
# 57-2、检查Series
import pandas as pd
# 创建一个Series
s = pd.Series([1, 2, np.nan, 4, None])
# 检查Series中的缺失值
print(pd.isna(s), end='\n\n')code>
# 57-3、检查DataFrame
import pandas as pd
# 创建一个DataFrame
df = pd.DataFrame({
 'A': [1, 2, np.nan],
 'B': [np.nan, 4, 5],
 'C': [1, np.nan, np.nan]
})
# 检查DataFrame中的缺失值
print(pd.isna(df)) 
57-6-3、结果输出
 # 57、pandas.isna函数
# 57-1、检查单个标量值
# True
# False
# True
# 57-2、检查Series
# 0 False
# 1 False
# 2 True
# 3 False
# 4 True
# dtype: bool
# 57-3、检查DataFrame
# A B C
# 0 False True False
# 1 False False True
# 2 True False True 
二、推荐阅读
 1、Python筑基之旅
 2、Python函数之旅
 3、Python算法之旅
 4、Python魔法之旅
 5、博客个人主页

 
 
   上一篇： Python | 实现 K-means 聚类——多维数据聚类散点图绘制 
  下一篇： 【C语言初阶】C语言函数全解析：编写高效代码的秘密武器 
  本文标签 
  Python酷库之旅-第三方库Pandas(022)    
 
  
  声明
  本文内容仅代表作者观点，或转载于其他网站，本站不以此文作为商业用途
 如有涉及侵权，请联系本站进行删除
 转载本站原创文章，请注明来源及作者。