Skip to content

Commit bec620c

Browse files
Jake TeoJake Teo
Jake Teo
authored and
Jake Teo
committed
commit
1 parent 7ff6952 commit bec620c

File tree

3 files changed

+149
-1
lines changed

3 files changed

+149
-1
lines changed

assumptions.rst

+17
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
Tests for Assumptions
2+
=====================
3+
4+
Normality
5+
---------
6+
7+
.. code:: python
8+
9+
import scipy.stats as stats
10+
stats.normaltest(df3['depth'])
11+
12+
>>> NormaltestResult(statistic=33363.134206705407, pvalue=0.0)
13+
14+
15+
16+
Homogeneity of Variances
17+
------------------------

difference.rst

+85-1
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,17 @@ X, Explantory: ``Categorical``
77
Y, Response: ``Categorical``
88
Type: ``Non-Parametric``
99

10+
.. code:: python
11+
12+
print 'chi-square statistic, p-value, expected counts'
13+
print ss.chi2_contingency(ct1)
14+
15+
chi-square statistic, p-value, expected counts
16+
(1263.6306705804054, 2.554837585615145e-272, 4, array([[ 7.74251477e+03, 1.71950205e+03, 3.69930718e+02,
17+
4.25495413e+01, 2.50291420e+00],
18+
[ 7.72448523e+03, 1.71549795e+03, 3.69069282e+02,
19+
4.24504587e+01, 2.49708580e+00]]))
20+
1021
1122
Student's T-Test
1223
----------------
@@ -17,4 +28,77 @@ ANOVA
1728
-----
1829
Type: ``Parametric``
1930

20-
Analysis of Variance (ANOVA).
31+
Analysis of Variance (ANOVA).
32+
33+
34+
.. code:: python
35+
36+
#### IMPORT MOUDLES ####
37+
import numpy as np
38+
import pandas as pd
39+
import statsmodels.formula.api as smf
40+
import statsmodels.stats.multicomp as multi
41+
42+
43+
44+
#### FIT MODEL ####
45+
# response~explanatory OR x~y, 'C' refers to categorical variable
46+
# ANOVA for multiple factors
47+
model = smf.ols(formula='diameter ~ C(layers)', data=df3)
48+
results = model.fit()
49+
>>> print results.summary()
50+
51+
52+
OLS Regression Results
53+
==============================================================================
54+
Dep. Variable: diameter R-squared: 0.219
55+
Model: OLS Adj. R-squared: 0.219
56+
Method: Least Squares F-statistic: 1383.
57+
Date: Tue, 02 Aug 2016 Prob (F-statistic): 0.00
58+
Time: 17:04:57 Log-Likelihood: -60976.
59+
No. Observations: 19731 AIC: 1.220e+05
60+
Df Residuals: 19726 BIC: 1.220e+05
61+
Df Model: 4
62+
Covariance Type: nonrobust
63+
==================================================================================
64+
coef std err t P>|t| [95.0% Conf. Int.]
65+
----------------------------------------------------------------------------------
66+
Intercept 6.7217 0.043 157.125 0.000 6.638 6.806
67+
C(layers)[T.2] 3.3941 0.100 33.822 0.000 3.197 3.591
68+
C(layers)[T.3] 12.2841 0.200 61.319 0.000 11.891 12.677
69+
C(layers)[T.4] 18.3139 0.579 31.649 0.000 17.180 19.448
70+
C(layers)[T.5] 21.8123 2.380 9.166 0.000 17.148 26.477
71+
==============================================================================
72+
Omnibus: 14916.319 Durbin-Watson: 0.529
73+
Prob(Omnibus): 0.000 Jarque-Bera (JB): 577157.627
74+
Skew: 3.262 Prob(JB): 0.00
75+
Kurtosis: 28.680 Cond. No. 64.0
76+
==============================================================================
77+
78+
Warnings:
79+
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
80+
81+
82+
83+
84+
#### POST-HOC TEST ####
85+
mc = multi.MultiComparison(df3['diameter'],df3['layers'])
86+
result1 = mc.tukeyhsd()
87+
print result1
88+
89+
90+
Multiple Comparison of Means - Tukey HSD,FWER=0.05
91+
=============================================
92+
group1 group2 meandiff lower upper reject
93+
---------------------------------------------
94+
1 2 3.3941 3.1204 3.6679 True
95+
1 3 12.2841 11.7376 12.8306 True
96+
1 4 18.3139 16.7353 19.8925 True
97+
1 5 21.8123 15.3204 28.3041 True
98+
2 3 8.89 8.3015 9.4785 True
99+
2 4 14.9198 13.3262 16.5134 True
100+
2 5 18.4181 11.9226 24.9137 True
101+
3 4 6.0298 4.3675 7.6921 True
102+
3 5 9.5281 3.0154 16.0409 True
103+
4 5 3.4984 -3.1806 10.1773 False
104+
---------------------------------------------

supervised.rst

+47
Original file line numberDiff line numberDiff line change
@@ -232,6 +232,53 @@ An ensemble of decision trees.
232232

233233
Logistic Regression
234234
**************************
235+
Binary output.
236+
237+
.. code:: python
238+
239+
#### IMPORT MODULES ####
240+
import pandas as pd
241+
import statsmodels.api as sm
242+
243+
244+
245+
#### FIT MODEL ####
246+
lreg = sm.Logit(df3['diameter_cut'], df3[trainC]).fit()
247+
print lreg.summary()
248+
249+
250+
Optimization terminated successfully.
251+
Current function value: 0.518121
252+
Iterations 6
253+
Logit Regression Results
254+
==============================================================================
255+
Dep. Variable: diameter_cut No. Observations: 18067
256+
Model: Logit Df Residuals: 18065
257+
Method: MLE Df Model: 1
258+
Date: Thu, 04 Aug 2016 Pseudo R-squ.: 0.2525
259+
Time: 14:13:14 Log-Likelihood: -9360.9
260+
converged: True LL-Null: -12523.
261+
LLR p-value: 0.000
262+
================================================================================
263+
coef std err z P>|z| [95.0% Conf. Int.]
264+
--------------------------------------------------------------------------------
265+
depth 4.2529 0.067 63.250 0.000 4.121 4.385
266+
layers_YESNO -2.1102 0.037 -57.679 0.000 -2.182 -2.039
267+
================================================================================
268+
269+
270+
271+
#### CONFIDENCE INTERVALS ####
272+
params = lreg.params
273+
conf = lreg.conf_int()
274+
conf['OR'] = params
275+
conf.columns = ['Lower CI', 'Upper CI', 'OR']
276+
print (np.exp(conf))
277+
278+
Lower CI Upper CI OR
279+
depth 61.625434 80.209893 70.306255
280+
layers_YESNO 0.112824 0.130223 0.121212
281+
235282
236283
Support Vector Machine
237284
***********************

0 commit comments

Comments
 (0)