pandas的基本用法(一)——数据定义及使用 | | pandas的基本用法(一)——数据定义及使用 文章作者:Tyan博客:noahsnail.com | CSDN | 简书 本文主要是关于pandas的一些基本用法。 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147#!/usr/bin/env python# _*_ coding: utf-8 _*_import pandas as pdimport numpy as np# Test 1# 定义序列, pandas中的数据形式通常是float32或float64s = pd.Series([1, 3, 5, np.nan, 44, 1])print sprint s[0]print s[3]# Test 1 result0 1.01 3.02 5.03 NaN4 44.05 1.0dtype: float641.0nan# Test 2# 定义日期列表dates = pd.date_range('20170101', periods = 6)print datesprint dates[5]# Test 2 resultDatetimeIndex(['2017-01-01', '2017-01-02', '2017-01-03', '2017-01-04', '2017-01-05', '2017-01-06'], dtype='datetime64[ns]', freq='D')2017-01-06 00:00:00# Test 3# DataFrame类似于numpy的array, 行索引为dates, 列索引为[a, b, c, d]df = pd.DataFrame(np.random.randn(6, 4), index = dates, columns = ['a', 'b', 'c', 'd'])print df# 不指定索引的DataFramedf = pd.DataFrame(np.arange(12).reshape(3, 4))print df# DataFrame的定义df = pd.DataFrame({'A': 1., 'B': 'Foo', 'C': np.array([3] * 4)})print df# Test 3 result a b c d2017-01-01 1.104994 1.328379 0.410358 -1.6610592017-01-02 -0.642727 -0.152576 1.126191 -0.0053172017-01-03 -0.179257 0.160972 -0.824172 -0.1750272017-01-04 0.838328 -0.500909 0.714592 1.1448002017-01-05 0.803691 -3.979186 -1.037603 -0.7479432017-01-06 1.217289 -0.074413 0.504138 -0.077507 0 1 2 30 0 1 2 31 4 5 6 72 8 9 10 11 A B C0 1.0 Foo 31 1.0 Foo 32 1.0 Foo 33 1.0 Foo 3# Test 4# 查看DataFrame的数据类型df.dtypes# 查看DataFrame的索引df.index# 查看DataFrame的列索引df.columns# 查看DataFrame的值df.values# 查看DataFrame的描述df.describe()# DataFrame的转置df.T# DataFrame的index排序df.sort_index(axis = 1)# DataFrame的index排序, 逆序df.sort_index(axis = 1, ascending = False)# DataFrame按值排序df.sort_values(by = 'C')# Test 4 resultA float64B objectC int64dtype: objectRangeIndex(start=0, stop=4, step=1)Index([u'A', u'B', u'C'], dtype='object')array([[1.0, 'Foo', 3], [1.0, 'Foo', 3], [1.0, 'Foo', 3], [1.0, 'Foo', 3]], dtype=object) A Ccount 4.0 4.0mean 1.0 3.0std 0.0 0.0min 1.0 3.025% 1.0 3.050% 1.0 3.075% 1.0 3.0max 1.0 3.0 0 1 2 3A 1 1 1 1B Foo Foo Foo FooC 3 3 3 3 A B C0 1.0 3 Foo1 1.0 3 Foo2 1.0 3 Foo3 1.0 3 Foo C B A0 Foo 3 1.01 Foo 3 1.02 Foo 3 1.03 Foo 3 1.0 A B C0 1.0 3 Foo1 1.0 3 Foo2 1.0 3 Foo3 1.0 3 Foo 参考资料 https://www.youtube.com/user/MorvanZhou 如果有收获,可以请我喝杯咖啡! 赏 微信打赏 支付宝打赏