๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ

Python/Pandas

[Python] Pandas - DataFrame ์ด์ƒ์น˜ ์ œ๊ฑฐ

๋ฐ˜์‘ํ˜•
def dr_outlier(df):
    quartile_1 = df.quantile(0.25)
    quartile_3 = df.quantile(0.75)
    IQR = quartile_3 - quartile_1
    condition = (df < (quartile_1 - 1.5 * IQR)) | (df > (quartile_3 + 1.5 * IQR))
    condition = condition.any(axis=1)
    search_df = df[condition]

    return search_df, df.drop(search_df.index, axis=0)

 

 

<์ฐธ๊ณ >

 

https://wikidocs.net/83562

 

์œ„ํ‚ค๋…์Šค

์˜จ๋ผ์ธ ์ฑ…์„ ์ œ์ž‘ ๊ณต์œ ํ•˜๋Š” ํ”Œ๋žซํผ ์„œ๋น„์Šค

wikidocs.net

 

https://ko.khanacademy.org/math/statistics-probability/summarizing-quantitative-data/box-whisker-plots/a/identifying-outliers-iqr-rule

 

1.5xIQR ๊ทœ์น™์„ ์ด์šฉํ•ด ์ด์ƒ์น˜ ์ฐพ๊ธฐ (๊ฐœ๋… ์ดํ•ดํ•˜๊ธฐ) | ์ƒ์ž๊ทธ๋ฆผ | ์นธ์•„์นด๋ฐ๋ฏธ

 

ko.khanacademy.org

 

 

https://en.wikipedia.org/wiki/Interquartile_range

 

Interquartile range - Wikipedia

In descriptive statistics, the interquartile range (IQR), also called the midspread, middle 50%, or Hโ€‘spread, is a measure of statistical dispersion, being equal to the difference between 75th and 25th percentiles, or between upper and lower quartiles,[1

en.wikipedia.org

 

๋ฐ˜์‘ํ˜•