ChatGPT使用场景：自动数据分析

文章目录[隐藏]

使用建议与心得
场景1：使用matplotlib
- 相关性分析
- 自动分析
场景2：使用seaborn
场景3：使用plotly
其他指南

本文将介绍如何使用ChatGPT进行自动数据分析，使用的工具有matplotlib、seaborn和plotly。

使用建议与心得

在每日通过ChatGPT进行学习和编码的过程中，笔者发现了如下几个规律：

ChatGPT生成代码的能力取决于问题的难度，以及原始数据集中类似数据的多少。
ChatGPT生成结果的有效性取决于提问模板、给出的例子和背景信息。

因此为了简化问题难度，也为了ChatGPT能理解数据集，并直接写出正确的代码（不需要修改就可以运行），需要让ChatGPT快速数据集的列明和列类型，将数据集进行描述。

下面是分别使用matplotlib、seaborn和plotly进行数据分析样例，ChatGPT输出的代码都可以直接运行。

注明：前提是已经读取了相关数据集。

场景1：使用matplotlib

自动分析

魔法命令

数据表 iris_df 存在Pandas中，结构如下 - sepal length (cm), 类型 float64 - sepal width (cm), 类型 float64 - petal length (cm), 类型 float64 - petal width (cm), 类型 float64 - target, 类型 int64

写出Python代码，使用matplotlib中3种方法分析每列与标签的分布关系

ChatGPT输出

# 方法1：散点图fig, axs = plt.subplots(2, 2, figsize=(10, 8)) axs[0, 0].scatter(iris_df['sepal length (cm)'], iris_df['target']) axs[0, 0].set_xlabel('sepal length (cm)') axs[0, 0].set_ylabel('target') axs[0, 1].scatter(iris_df['sepal width (cm)'], iris_df['target']) axs[0, 1].set_xlabel('sepal width (cm)') axs[0, 1].set_ylabel('target') axs[1, 0].scatter(iris_df['petal length (cm)'], iris_df['target']) axs[1, 0].set_xlabel('petal length (cm)') axs[1, 0].set_ylabel('target') axs[1, 1].scatter(iris_df['petal width (cm)'], iris_df['target']) axs[1, 1].set_xlabel('petal width (cm)') axs[1, 1].set_ylabel('target') plt.show()

# 方法2：箱线图

fig, axs = plt.subplots(2, 2, figsize=(10, 8))
axs[0, 0].boxplot(iris_df['sepal length (cm)'], vert=False)
axs[0, 0].set_xlabel('sepal length (cm)')
axs[0, 1].boxplot(iris_df['sepal width (cm)'], vert=False)
axs[0, 1].set_xlabel('sepal width (cm)')
axs[1, 0].boxplot(iris_df['petal length (cm)'], vert=False)
axs[1, 0].set_xlabel('petal length (cm)')
axs[1, 1].boxplot(iris_df['petal width (cm)'], vert=False)
axs[1, 1].set_xlabel('petal width (cm)')
plt.show()

# 方法3：直方图

fig, axs = plt.subplots(2, 2, figsize=(10, 8))
axs[0, 0].hist(iris_df['sepal length (cm)'], bins=20)
axs[0, 0].set_xlabel('sepal length (cm)')
axs[0, 0].set_ylabel('count')
axs[0, 1].hist(iris_df['sepal width (cm)'], bins=20)
axs[0, 1].set_xlabel('sepal width (cm)')
axs[0, 1].set_ylabel('count')
axs[1, 0].hist(iris_df['petal length (cm)'], bins=20)
axs[1, 0].set_xlabel('petal length (cm)')
axs[1, 0].set_ylabel('count')
axs[1, 1].hist(iris_df['petal width (cm)'], bins=20)
axs[1, 1].set_xlabel('petal width (cm)')
axs[1, 1].set_ylabel('count')

场景2：使用seaborn

魔法命令

数据表 iris_df 存在Pandas中，结构如下 - sepal length (cm), 类型 float64 - sepal width (cm), 类型 float64 - petal length (cm), 类型 float64 - petal width (cm), 类型 float64 - target, 类型 int64

写出Python代码，写seaborn中3种方法分析每列与标签的分布关系

ChatGPT输出

# 方法1：散点图+气泡图sns.scatterplot(x='sepal length (cm)', y='target', data=iris_df, hue='target', size='petal length (cm)', sizes=(20, 200)) plt.xlabel('sepal length (cm)') plt.ylabel('target') plt.show()

# 方法2：小提琴图

sns.violinplot(x='target', y='sepal length (cm)', data=iris_df)
plt.xlabel('target')
plt.ylabel('sepal length (cm)')
plt.show()

# 方法3：直方图+核密度图

sns.histplot(data=iris_df, x="sepal length (cm)", hue="target", element="step", kde=True)
plt.xlabel('sepal length (cm)')
plt.show()

场景3：使用plotly

魔法命令

数据表 iris_df 存在Pandas中，结构如下 - sepal length (cm), 类型 float64 - sepal width (cm), 类型 float64 - petal length (cm), 类型 float64 - petal width (cm), 类型 float64 - target, 类型 int64

写出Python代码，使用plotly中3种方法分析每列与标签的分布关系

ChatGPT输出

# 方法1: scatter plotfig1 = px.scatter(iris_df, x="sepal length (cm)", y="target", color="target") fig1.show()

# 方法2: histogram

fig5 = px.histogram(iris_df, x="sepal length (cm)", color="target")
fig5.show()

# 方法3: box plot

fig9 = px.box(iris_df, x="target", y="sepal length (cm)")
fig9.show()

其他指南

缺失值分析

数据表 iris_df 存在Pandas中，结构如下 - sepal length (cm), 类型 float64 - sepal width (cm), 类型 float64 - petal length (cm), 类型 float64 - petal width (cm), 类型 float64 - target, 类型 int64

写出Python代码，使用plotly对数据集进行缺失值分析

探索性分析

数据表 iris_df 存在Pandas中，结构如下 - sepal length (cm), 类型 float64 - sepal width (cm), 类型 float64 - petal length (cm), 类型 float64 - petal width (cm), 类型 float64 - target, 类型 int64

写出Python代码，使用plotly对数据集进行探索性分析

【竞赛报名/项目咨询请加微信：mollywei007】