『Data Science』R语言学习笔记，基础语法

发布时间：2021-01-23 23:06:14 所属栏目：大数据来源：网络整理

导读：Data Types Data Object Vector x - c(0.5,0.6) ## numericx - c(TRUE,FALSE) ## logicalx - c(T,F) ## logicalx - c("a","b","c") ## characterx - 9:29 ## integerx - c(1+0i,2+4i) ## complexx - vector("numeric",length = 10) ## create a numeric ve

Data Types

Data Object & Vector

x <- c(0.5,0.6)        ## numeric
x <- c(TRUE,FALSE)     ## logical
x <- c(T,F)            ## logical
x <- c("a","b","c")     ## character
x <- 9:29               ## integer
x <- c(1+0i,2+4i)      ## complex

x <- vector("numeric",length = 10) ## create a numeric vector,which length is 10.

x <- 0.6    ## get the class type of the variable
class(x)    ## print the class type of "x".

x <- 1:10   ## set the class type to the variable forcibly.
as.character(x)

List

x <- list("...","...",...)

Matrices

Matrices are vectors with a dimension attribute. The dimension attribute is itself an integer vector of lenght 2 (nrow,ncol).

m <- matrix(nrow = 2,ncol = 3)
n <- matrix(1:6,nrow = 2,ncol = 3)

dim(m)          ## get the value of "norw,ncol" of the matrix.

attributes(m)   ## get the a of  

m <- 1:10           ## create a new numeric vector,from 1 to 10
dim(m) <- c(2,5)    ## put the vector "m" into a matrix,and assign the value (nrow = 2,ncol = 3) to it.
m                   ## print the value of "m".

x <- 1:3
y <- 10:12
cbind(x,y)     ## create a matrix by "cbind",binding the value of columns with variables,which has 3 rows and 2 columns.
rbind(x,y)     ## create a matrix by "rbind",binding the value of rows with variables,which has 2 rows and 3 columns.

Factors

Factors are used to represent categorical data. One can think of a factor is an integer vector where each integer has a label.

x <- factor(c("yes","yes","no","no"))  ## create a factor with a character vector.
x                                                       ## print the factor.
table(x)                                                ## list the label (with its quantity) of the factor in a table.
unclass(x)                                              ## list the value and the label of the factor.

x <- factor(c("yes",level("yes","no")))  ## create a factor with a character vector which had set the "levels" in it.

Missing Values

Missing values are denoted by NA of NaN for undefined mathematical operations.

is.na()     
is.nan()    

x <- c(1,2,NaN,NA,4)    ## Create a vector for test the functions,```is.na()``` and ```is.nan()```.
is.na(x)                    ## NA values have a class also,so there are integer NA,character NA,etc.
is.nan(x)                   ## A NaN value is also NA but the converse is not true.

Whole codes below:

> x <- c(1,10,3)
> is.na(x)
[1] FALSE FALSE  TRUE FALSE FALSE
> is.nan(x)
[1] FALSE FALSE FALSE FALSE FALSE
> x <- c(1,4)
> is.na(x)
[1] FALSE FALSE  TRUE  TRUE FALSE
> is.nan(x)
[1] FALSE FALSE  TRUE FALSE FALSE

Data Frames

Data frames are used to store tabular data.

They are represented as a special type of list where every element of the list has to have the same length.
Each element of the list can be thought of as a column and the length of each element of the list is the number of rows.
Unlike matrices,data frames can store different classes of objects in each column (just like lists);matrices must have every element be the same class.
Data frames also have a special attribute called row.names.
Data frames are usually created by calling read.table() or read.csv().
Can be converted to a matrix by calling data.matrix().

> x <- data.frame(foo = 1:4,bar = c(T,T,F,F))  ## create a Data Frame Object which has two columns and four rows.
> x
  foo   bar
1   1  TRUE
2   2  TRUE
3   3 FALSE
4   4 FALSE

Names

R objects can also have names,which is very useful for writing readable code and self-describing objects.

> x <- 4:6                              ## Create a integer vector 'x' which has three elements.
> names(x) <- c("foo","bar","norf")   ## Assign names to vector 'x'.
> x                                     ## Print the value of 'x'.
 foo  bar norf 
   4    5    6

Data Reading

Reading Data

read.table,read.csv,for reading tabular data,which return a data.frame object.
readLines,for reading lines of a text file.
source,for reading in R code files(inverse of dump).
dget,for reading in R code files(inverse of dput).
load,for reading in saved workspaces.
unserialize,for reading single R objects in binary form.

`read.table`

Description: Reads a file in table format and creates a data frame from it,with cases corresponding to lines and variables to fields in the file.

Main Arguments:

file
header
sep,columns separate,like ,.
colClasses,the data class types of the column.
nrows,number of the rows.
comment.character,a character vector indicating the class of each column in the dataset.
skip,the number of lines to skip from the beginning.
stringsAsFactors,should character variables be coded as factors?

Usages:

read.table(file,header = FALSE,sep = "",quote = ""'",dec = ".",numerals = c("allow.loss","warn.loss","no.loss"),row.names,col.names,as.is = !stringsAsFactors,na.strings = "NA",colClasses = NA,nrows = -1,skip = 0,check.names = TRUE,fill = !blank.lines.skip,strip.white = FALSE,blank.lines.skip = TRUE,comment.char = "#",allowEscapes = FALSE,flush = FALSE,stringsAsFactors = default.stringsAsFactors(),fileEncoding = "",encoding = "unknown",text,skipNul = FALSE)

read.csv(file,header = TRUE,sep = ",",quote = """,fill = TRUE,comment.char = "",...)

read.csv2(file,sep = ";",dec = ",...)

read.delim(file,sep = "t",...)

read.delim2(file,...)

Writing Data

Description: write.table prints its required argument x (after converting it to a data frame if it is not one nor a matrix) to a file or connection.

Main Points:

write.table
writeLines
dump
dput
save
serialize

Usages:

write.table(x,file = "",append = FALSE,quote = TRUE,sep = " ",eol = "n",na = "NA",row.names = TRUE,col.names = TRUE,qmethod = c("escape","double"),fileEncoding = "")

write.csv(...)
write.csv2(...)

Reading Large Tables

Read the help page for read.table,which contains many hints.
Make a rough calculation of the memory required to store your dataset. If the dataset is larger than the amount of RAM on your computer,you can probably stop right here.
Set comment.char = "" if there are no commented lines in your file.
Use the colClasses argument. Specifying this option instead of using the default can make read.table run MUCH faster,often twice as fast. In order to use this option,you have to know the class of each column in your data frame. If all of the columns are "numeric",for example,then you can just set colClasses = "numeric". A quick an dirty way to figure out the classes of each column is the following:

> initial <- read.table("db.txt",nrows = 100,sep = "t")
> classes <- sapply(initial,class)
> tabAll <- read.table("db.txt",colClasses = classes)

Set nrows. This doesn't make R run faster but it helps with memory usage. A mild overestimate is okay. You can use the Unix tool wc to calculate the number of lines in a file.

Reading Data Formats

`dput` and `dget`

> y <- data.frame(a = 1,b = "a") ## Create a `data.frame` object for `dput`
> dput(y)                         ## `dput` the object created before

structure(list(a = 1,b = structure(1L,.Label = "a",class = "factor")),.Names = c("a","b"),row.names = c(NA,-1L),class = "data.frame")

> dput(y,file = 'y.R')           ## `dput` the object created before into a file which named 'y.R'
> new.y <- dget('y.R')            ## get the data store in the file 'y.R'
> new.y                           ## print the data in the 'y.R'

  a b
1 1 a

`dump`

（编辑：安卓应用网_ASP源码网）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!

1/3

尾页