加入收藏 | 设为首页 | 会员中心 | 我要投稿 安卓应用网_ASP源码网 (https://www.1asp.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 大数据 > 正文

『Data Science』R语言学习笔记,基础语法

发布时间:2021-01-23 23:06:14 所属栏目:大数据 来源:网络整理
导读:Data Types Data Object Vector x - c(0.5,0.6) ## numericx - c(TRUE,FALSE) ## logicalx - c(T,F) ## logicalx - c("a","b","c") ## characterx - 9:29 ## integerx - c(1+0i,2+4i) ## complexx - vector("numeric",length = 10) ## create a numeric ve

Multiple objects can be deparsed using the dump function and read back in using source.

> x <- "foo"                          ## create the first data object
> y <- data.frame(a = 1,b = "a")     ## create the second data object
> dump(c("x","y"),file = "data.R")  ## store the both data object in to a file called 'data.R'
> rm(x,y)                            ## remove the both data object from RAM
> source("data.R")                    ## import the dumped file 'data.R'
> y                                   ## print the data object 'y' from 'data.R'
  a b
1 1 a
> x                                   ## print the data object 'x' from 'data.R'
[1] "foo"

Connections: Interfaces to the Outside World

Data are read in using connection interfaces. Connections can be made to files (most common) or to other more exotic things.

  • file,opens a connection to a file
  • gzfile,opens a connection to a file compressed with gzip
  • bzfile,opens a connection to a file compressed with bzip2
  • url,opens a connection to a webpage.
> con <- file('db.txt','r')
> readLines(con)

Subsetting

  • [always returns an object of the same class as the original; can be used to select more than one element (there is one exception)
  • [[is used to extract elements of a list or a data frame; it can only be used to extract a single element and the class of the returned object will not necessarily be a list or data frame.
  • $ is used to extract elements of a list or data frame by name; semantics are similar to hat of [[.

Basic

> x <- c("a","c","d","e")
> x[1]
[1] "a"
> x[2]
[1] "b"
> x[1:3]
[1] "a" "b" "c"
> x[x > "a"]
[1] "b" "c" "d" "e"
> u  <- x>"a"
> u
[1] FALSE  TRUE  TRUE  TRUE  TRUE
> x[u]
[1] "b" "c" "d" "e"

Lists

> x <- list(foo = 1:4,bar = 0.6)

> x[1]
$foo
[1] 1 2 3 4
> x[[1]]
[1] 1 2 3 4
> x[[2]]
[1] 0.6

> x$bar
[1] 0.6
> x$foo
[1] 1 2 3 4

> x[["bar"]]
[1] 0.6
> x["bar"]
$bar
[1] 0.6
> x <- list(foo = 1:4,bar = 0.6,baz = "hello")

> x[c(1,3)]
$foo
[1] 1 2 3 4
$baz
[1] "hello"

> name <- "foo"
> x[[name]]
[1] 1 2 3 4
> x$name          ## `name` is a variable,not a `level`,so does not has x$name in the list `x`.
NULL
> x$foo
[1] 1 2 3 4

Matrices

Matrices can be subsetted in the usual way with (i,j) type indices.

> x <- matrix(1:6,3)

> x[1,2]
[1] 3

> x[1,]
[1] 1 3 5

> x[,2]
[1] 3 4

> x[1,drop = FALSE]
     [,1]
[1,]    3

> x[1,1] [,2] [,3]
[1,]    1    3    5

Partial Matching

Partial matching of names is allowed with [[ and $.

> x <- list(aardvark = 1:5)

> x$a
[1] 1 2 3 4 5

> x[["a"]]
NULL

> x[["a",exact = FALSE]]
[1] 1 2 3 4 5

Removing NA Values

> x <- c(1,4,5)

> bad <- is.na(x)

> x[!bad]
[1] 1 2 4 5

Use built-in function complete.cases() to get a logical vector indicating which cases are complete,i.e.,have no missing values.

> x <- c(1,5)
> y <- c("a","f")

> good <- complete.cases(x,y)

> good
[1]  TRUE  TRUE FALSE  TRUE FALSE  TRUE

> x[good]
[1] 1 2 4 5
> y[good]
[1] "a" "b" "d" "f"

From data frame

> airquality[1:6,]                    ## call a matrix 
  Ozone Solar.R Wind Temp Month Day
1    41     190  7.4   67     5   1
2    36     118  8.0   72     5   2
3    12     149 12.6   74     5   3
4    18     313 11.5   62     5   4
5    NA      NA 14.3   56     5   5   ## there a NA value in this vector
6    28      NA 14.9   66     5   6   ## there a NA value in this vector

> good <- complete.cases(airquality)  ## as there a NA value in 6s/7s row,so it is filtered.

> airquality[good,][1:6,]
  Ozone Solar.R Wind Temp Month Day
1    41     190  7.4   67     5   1
2    36     118  8.0   72     5   2
3    12     149 12.6   74     5   3
4    18     313 11.5   62     5   4   
7    23     299  8.6   65     5   7
8    19      99 13.8   59     5   8

Vectorized Operations

  • many operations in R are vectorized making code more efficient,concise,and easier to read.
> x <- 1:4; y <- 6:9

> x + y
[1]  7  9 11 13

> x > 2
[1] FALSE FALSE  TRUE  TRUE

> y >= 2
[1] TRUE TRUE TRUE TRUE

> y == 8
[1] FALSE FALSE  TRUE FALSE

> x * y
[1]  6 14 24 36

> x / y
[1] 0.1666667 0.2857143 0.3750000 0.4444444

Logic Control

if-else

> if (x > 3) {
+     y <- 10
+ } else {
+     y <- 0
+ }

For

> x <- c("a","d")
> for (i in 1:4) {
+     print(x[i])
+ }
[1] "a"
[1] "b"
[1] "c"
[1] "d"

> for(i in seq_along(x)) {
+     print(x[i])
+ }
[1] "a"
[1] "b"
[1] "c"
[1] "d"

> for(letter in x){
+     print(letter)
+ }
[1] "a"
[1] "b"
[1] "c"
[1] "d"

> for(i in 1:4) print(x[i])
[1] "a"
[1] "b"
[1] "c"
[1] "d"

While

> count <- 0
> while(count < 10) {
+     print(count)
+     count <- count + 1
+ }
[1] 0
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9

> z <- 5
> while(z >=3 && z <= 10) {
+     print(z)
+     coin <- rbinom(1,1,0.5)
+     
+     if(coin == 1) {
+         z <- z + 1
+     } else {
+         z <- z - 1
+     }
+ }
[1] 5
[1] 4
[1] 3
[1] 4
[1] 5
[1] 4
[1] 5
[1] 4
[1] 3

Repeat

> x0 <- 1
> tol <- 1e-8
> repeat {
+     x1 <- computeEstimate()
+     if(abs(x1 - x0) < tol) {
+         break
+     } else {
+         x0 <- x1
+     }
+ }
> for(i in 1:100) {
+     if(i <= 20) {
+         next        ## jump into next loop
+     }
+ }

Function

> add2 <- function(x,y) {
+   x + y
+ }

> add2(2,3)
[1] 5
> above <- function(x,n = 10) {
+   use <- x >n
+   x[use]
+ }

> x <- 1:20
> above(x,10)
 [1] 11 12 13 14 15 16 17 18 19 20
> columnmean <- function(y,removeNA = TRUE) {
+   nc <- ncol(y)
+   means <- numeric(nc)
+   for(i in 1:nc) {
+     means[i] <- mean(y[,i],na.rm = removeNA)
+   }
+   means                       ## return result
+ }

> columnmean(airquality)        ## compute the mean of values of columns of `airqulity`.
[1]  42.129310 185.931507   9.957516  77.882353   6.993464  15.803922

The ... Argument

... is often used when extending another function and you don't want to copy the entire argument list of the original function.

myplot <- function(x,y,type = "1",...) {
  plot(x,type = type,...)
}

(编辑:安卓应用网_ASP源码网)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

推荐文章
    热点阅读