pandas一些高效好用的函数方法

智能转换数据类型:

DataFrame.convert_dtypes(infer_objects=True, convert_string=True, convert_integer=True, convert_boolean=True, convert_floating=True)

转换数据类型:

Series.astype(dtype, copy=True, errors='raise')

按数据类型一次性选择所有目标列:

DataFrame.select_dtypes(include=None, exclude=None)

目标值转成日期时间值:

pandas.to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False, utc=None, format=None, exact=True, unit=None, infer_datetime_format=False, origin='unix', cache=True)

数值字符串转成数值类型:

pandas.to_numeric(arg, errors='raise', downcast=None)

日期时间的推移计算:

Series.to_period(freq=None, copy=True)

判断是否包含枚举值:

Series.isin(values)

区间比较判断:

Series.between(left, right, inclusive='both')

排序取前n行:

DataFrame.nlargest(n, columns, keep='first')
DataFrame.nsmallest(n, columns, keep='first')

分组:

pandas.cut(x, bins, right=True, labels=None, retbins=False, precision=3, include_lowest=False, duplicates='raise', ordered=True)
pandas.qcut(x, q, labels=None, retbins=False, precision=3, duplicates='raise')

分组统计后合并到原始数据中:

DataFrame.transform(func, axis=0, *args, **kwargs)

将 Series 中的每个值替换为另一个值:

Series.map(arg, na_action=None)

对 Series 的值应用函数:

Series.apply(func, convert_dtype=True, args=(), **kwargs)

分组统计:

GroupBy.agg(func, *args, **kwargs)

透视表:

DataFrame.pivot_table(values=None, index=None, columns=None, aggfunc='mean', fill_value=None, margins=False, dropna=True, margins_name='All', observed=False, sort=True)

累计和:

Series.cumsum(axis=None, skipna=True, *args, **kwargs)

百分比变化或环比同比:

Series.pct_change(periods=1, fill_method='pad', limit=None, freq=None, **kwargs)

去重计数:

Series.nunique(dropna=True)

相关系数:

DataFrame.corr(method='pearson', min_periods=1)
Series.corr(other, method='pearson', min_periods=None)