diff --git a/PMLProject.Rmd b/PMLProject.Rmd
new file mode 100644
index 00000000..40a0054b
--- /dev/null
+++ b/PMLProject.Rmd
@@ -0,0 +1,241 @@
+---
+title: "Practical Machine Learning Project"
+author: "Sahera Kadim"
+date: "24 July 2015"
+output: html_document
+---
+
+In this project, our goal is to use data from accelerometers
+on the belt, forearm, arm, and dumbell of 6 participants. They
+were asked to perform barbell lifts correctly and incorrectly in 5
+different ways (A:E). The data contains training data set with 19622 
+observations and 160 variables, and test set of 20 observations and 160
+variables.The goal is to predict the manner in which they did the 
+exercise. This is the "classe" variable in the training set.
+
+**1.Setting the environment**
+
+
+```{r, setEnv, eval=FALSE}
+
+library(wsrf);library(C50);library(rpart);library(caret); 
+library(pROC);library(adabag); library(klaR);library(lattice);
+library(ggplot2);library(pROC);library(parallel);library(doSNOW)
+ machineCores <- detectCores()
+ registerDoSNOW(machineCores)
+ setwd("~/R_work/ML")
+
+```
+
+**2.Read and clean data.**
+
+```{r RnC, eval=FALSE}
+ 
+ train.df<-read.csv("pml-training.csv",header=TRUE)
+ (tt <- table(train.df$classe))
+ ##Remove the column contain NA, 
+ del.col <- which(colSums(is.na(train.df))>10000)
+ train.df <- train.df[,-del.col]
+ (sum(is.na(train.df))) 
+ dim(train.df)
+ ##Remove empty column
+ train.df <- train.df[,(train.df[1,]!="")]
+ dim(train.df)
+ train.df<- train.df[!names(train.df) %in% c("X","user_name")] 
+ dim(train.df)
+
+```
+
+**3.To see if there is a linear combos for any of the numeric data.
+To make sure we don't have any near zero variables.
+To find correlations between all numeric variables.**
+
+```{r comb,  eval=FALSE}
+ 
+ comboInfo <- findLinearCombos(train.df[,c(1:2,5:57)])
+ nzv <-nearZeroVar(train.df)
+ train.df$new_window<-NULL
+ dCor<- cor(train.df[,c(1,2,4:56)])
+ findCorrelation(dCor, cutoff = .99)
+ names(train.df)[13]
+ train.df$accel_belt_y<-NULL
+ dim(train.df)
+ 
+```
+
+
+**4.The function createDataPartition can be used to create a 
+stratified random sample of the data into training and test sets**
+ 
+```{r stand, eval=FALSE}
+ inTrain <-createDataPartition(y=train.df$classe,p=.75,list=FALSE)
+ training <-train.df[inTrain,]
+ testing <- train.df[-inTrain,]
+```
+
+
+**5.Scalable Weighted Subspace Random Forests(wsrf) used to get 
+variables importance**
+
+wsrf is an R Package for Scalable Weighted Subspace Random Forests.The 
+algorithm can classify very high-dimensional data with random  forests 
+built using small subspaces. A novel variable weighting method is used 
+for variable subspace selection in place of the traditional random 
+variable sampling.This new approach is particularly useful in building 
+models from high-dimensional data.
+It is faster than random forest and has more useful properties than 
+random forest. I used it to know the variable importance.
+The first variable importance measures is computed from permuting OOB 
+data: For each tree, the prediction error on the out-of-bag portion of 
+the data is recorded. Then the same is done after permuting each 
+predictor variable. The difference between the two are then averaged 
+over all trees, and normalized by the standard deviation of the 
+differences.
+The second measure is the total decrease in node impurities from 
+splitting on the variable, averaged over all trees. The node impurity 
+is measured by the Information Gain Ratio index.
+
+```{r wsrf, eval=FALSE}
+model.wsrf <- wsrf(classe ~. , data= training, mtry=6)
+wsrf.preds <- predict(model.wsrf, newdata= training, type="class")     
+correlation(model.wsrf)
+oob.error.rate(model.wsrf)
+var.imp <- varCounts.wsrf(model.wsrf)
+ impvar <- sort(var.imp, decreasing=TRUE)
+ impvar; length(impvar) 
+ imp.df <- names(impvar)[1:35]
+ trim <- training[,imp.df]
+ classe <- training$classe
+ training <- data.frame (classe, trim)
+ dim(training)
+ names(training)[1]
+ 
+```
+ 
+ 
+ **6.Fitting Models**
+
+*-First trylinear discriminant analysis & using train control to 
+set  cross validation.*
+
+```{r lda ,eval=FALSE}
+ 
+cvCtrl <- trainControl(method = "repeatedcv",repeats = 3)
+model.lda <-train(classe~.,data=training,method="lda",
+                    trControl= cvCtrl)
+ pred.lda <- predict(model.lda, testing)
+ confusionMatrix(pred.lda,testing$classe)
+``` 
+ 
+*-Decision trees*
+
+C5.0 decision trees and rule-based models for pattern recognition.
+By default, C5.0 measures predictor importance by determining the 
+percentage of training set samples that fall into all the terminal 
+nodes after the split.
+
+```{r C5, eval=FALSE}
+ 
+ model.C5Rules <- C5.0(classe ~ ., data = training, rules=TRUE)
+ summary(model.C5Rules) ## show the rules
+ pred.C5 <- predict( model.C5Rules,testing,type = "class")
+table(pred.C5)
+table(pred.C5,testing$classe)
+C5imp(model.C5Rules, metric = "splits") 
+```
+
+*-Recursive Partitioning and Regression Trees(rpart)* with tuning to 
+set *cross validation.*
+
+```{r rpart, eval=FALSE}
+tuned.rpart <- train(classe ~ ., data = training,method = "rpart",
+                      tuneLength = 30,trControl = cvCtrl)
+ p.tuned.rpart = predict(tuned.rpart,testing)
+ confusionMatrix(p.tuned.rpart,testing$classe)
+```
+
+
+*-Tuning C5 to set cross validation* using the same train control used 
+with rpart with *grid.*
+
+```{r C5tun, eval=FALSE}
+grid <- expand.grid(.model="tree", .trials = c(1:100),.winnow = FALSE)
+tuned.C5 <- C5.0(classe ~ ., data = training,
+                 metric = "ROC",tuneGrid = grid,trControl = cvCtrl)
+p.tuned.C5 = predict(tuned.C5,testing)
+
+```
+
+ 
+*-Compare two tunned models.*
+ 
+```{r comp, eval = FALSE}
+
+p.tuned.rpart = predict(tuned.rpart,testing)
+p.tuned.C5 = predict(tuned.C5,testing)
+qplot(p.tuned.rpart, p.tuned.C5, color=classe, data=testing)
+equal.Preds = (p.tuned.rpart== p.tuned.C5)
+sum(equal.Preds)
+confusionMatrix(p.tuned.rpart,p.tuned.C5)
+qplot(num_window,cvtd_timestamp ,colour=equal.Preds,data=testing)
+
+```
+
+*-Boosting*
+
+With boosting we take a different approach to refitting models. 
+Consider a classification task in which we start with a basic learner 
+and apply it to the data of interest. Next the learner is
+refit, but with more weight given to misclassified observations. This 
+process is repeated until some stopping rule is reached. Boosting in 
+general is highly resistant to overfitting.
+
+```{r ada , eval=FALSE}
+model.ada <- boosting(classe ~., data =training, boos=TRUE, mfinal=10)
+ada.pred <- predict(object= model.ada,newdata =testing, type = "class")
+ada.pred$confusion
+importanceplot(model.ada)
+
+```
+
+**7.Test models accuracy(out of sample error)against the cross 
+validation set.**
+
+```{r acc, eval=FALSE}
+ 
+(model.lda.acc <-sum(predict(model.lda,testing)==                      
+                        testing$classe)/length(testing$classe))
+(tuned.rpart.acc <-sum(predict(tuned.rpart,testing)==                  
+                           testing$classe)/length(testing$classe))
+(tuned.C5.acc <-sum(predict(tuned.C5,testing)==
+                         testing$classe)/length(testing$classe))
+(model.C5Rules.acc <-sum(predict( model.C5Rules,testing)==             
+                             testing$classe)/length(testing$classe))
+```
+
+**8.Predict the 20 values from the test set**
+
+```{r test, eval=FALSE}
+ 
+ test.df<-read.csv("pml-testing.csv",header=TRUE)
+ 
+ (submit<-predict(model.lda,test.df))
+ 
+ (submit<-predict(tuned.rpart,test.df))
+
+ (submit<-predict(tuned.C5,test.df))
+
+ ( submit<-predict(model.C5Rules,test.df))
+  
+```
+
+**9.conclusion:**
+
+I tried to handle NA values through randomForest function 
+na.roughfix()in wsrf model,it worked fine with wsrf model, but cause me
+problems with others. I found library(wsrf),library(C50) are quite 
+better than randomForest. C5 classifires with rules or with tuning are 
+very good and they give the same  test result.They have less train and 
+test error,fast, and have alot of usefull properties.
+I spent a lot of time on ensambling and meta classifiers,but most of 
+them in their early phase.But,the future is for Ensambling learning,and meta classifiers in RWeka and mlr packages,which need efforts to develop it.