Rcpp Study Record

Record my study in Rcpp
Author

Creo Hsia

Published

October 10, 2024

1 Rcpp

1.1 Format for defining a function in Rcpp

The following code shows the basic format for defining a Rcpp function.

#include<Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
RETURN_TYPE FUNCTION_NAME(ARGUMENT_TYPE ARGUMENT){

    //do something

    return RETURN_VALUE;
}
  • #include<Rcpp.h> : This sentence enables us to use classes and functions defined by the Rcpp package

  • using namespace Rcpp; : this sentence enables us to use them directly, other wise you need to declare that we use these function from package Rcpp by, e.g. Rcpp::NumericVector.

  • // [[Rcpp::export]]:The function defined just below this sentence will be accessible from R.

  • RETURN_TYPE FUNCTION_NAME(ARGUMENT_TYPE ARGUMENT){}:We need to specify data types of functions and arguments.

1.2 Compiling the code

The function Rcpp::sourceCpp() will compile your source code and load the defined function into R. The below code defines a function that calculates the sum of a vector.

//sum.cpp
#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
double rcpp_sum(NumericVector v){
    double sum = 0;
    for(int i=0; i<v.length(); ++i){
        sum += v[i];
    }
    return(sum);
}

Now we can load it into R by

library(Rcpp)
sourceCpp('sum.cpp')

1.3 Executing the function

We can use our Rcpp functions as usual R functions

rcpp_sum(1:10)
#> [1] 55

1.4 Embedding Rcpp code in our R code

We can write Rcpp code in our R code in 3 ways.

1.4.1 sourceCpp()

Save Rcpp code as string object in R and compile it with sourceCpp()

src <-
"#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
double rcpp_sum(NumericVector v){
  double sum = 0;
  for(int i=0; i<v.length(); ++i){
    sum += v[i];
  }
  return(sum);
}"

sourceCpp(code = src)
rcpp_sum(1:10)
#> [1] 55

1.4.2 cppFunction()

We can omit #include <Rcpp.h> and using namespase Rcpp when we use cppFunction().

src <-"double rcpp_sum(NumericVector v){
    double sum = 0;
    for(int i=0; i<v.length(); ++i){
      sum += v[i];
    }
    return(sum);
  }
  "
Rcpp::cppFunction(src)
rcpp_sum(1:10)
#> [1] 55

1.4.3 evalCpp()

You can evaluate a single C++ statement by using evalCpp().

# Showing maximum value of double.
evalCpp('std::numeric_limits<double>::max()')
#> [1] 1.797693e+308

1.5 C++ 11

C++ 11 is a standard of C++ newly established in 2011. Many new features have been added to make C++ even easier for beginners.

Important

The code examples in this document are written with C++11 enabled.

1.5.1 Enabling C++11

To enable C++11, add the following decription in our Rcpp code. > // [[Rcpp::plugins("cpp11")]]

1.6 Printing Messages

1.6.1 Rcout, Rcerr

// [[Rcpp::export]]
void rcpp_rcout(NumericVector v){
  // printing value of vector
  Rcout << "The value of v : " << v << "\n";

  // printing error message
  Rcerr << "Error message\n";
}

This line declares a function named rcpp_rcout that takes a NumericVector as an argument and returns nothing (void).

Rcout is an Rcpp-specific object similar to the standard C++ std::cout, but it is used for printing messages to the R console. It uses << to concatenate output.

Rcerr is similar to the standard C++ std:cerr, which is used for error messages and diagnostic output. Rcerr is used to send error or warning messages in R, so output sent through Rcerr might be displayed differently (in red text, depending on the R environment) to indicate an issue or warning.

1.6.2 Rprintf(), REprintf()

These functions are used to print formatted output to the R console. It allows you to format the output using placeholders (e.g., %d for integers, %f for floats).

Syntax:

Rprintf( format, variables)

The first argument is a format string, and subsequent arguments are the values to be inserted into the formatted string. Some format specifier is presented below:

specifier explanation
%i printing signed integer (int)
%u printing unsigned integer (unsigned int)
%f printing floating point number (double)
%e printing floating point number (double) in exponential style
%s printing C string (char*)

Additionally, Rprintf() and REprintf() can only print data types that exist in standard C++ language, thus you cannot pass data types defined by Rcpp package (such as NumericVector) to Rprintf() directly. If you want to print the values of elements of an Rcpp vector using Rprintf(), you have to pass each element separately to it (see below).

// [[Rcpp::export]]
void rcpp_rprintf(NumericVector v){
    // printing values of all the elements of Rcpp vector  
    for(int i=0; i<v.length(); ++i){
        Rprintf("the value of v[%i] : %f \n", i, v[i]);
    }
}

1.7 Data Types

All the basic data types and data structures provided by R are available in Rcpp. By using these data types, you can directly access the objects that exist in R. ### Vector and Matrix Following seven data types are often used in R.

logical integer numeric complex character Date POSIXct ### Vector and matrix There are vector type and matrix types in Rcpp corresponding to those of R.

The table below presents the correspondence of data types between R/Rcpp/C++.

Value R vector Rcpp vector Rcpp matrix Rcpp scalar C++ scalar
Logical logical LogicalVector LogicalMatrix - bool
Integer integer IntegerVector IntegerMatrix - int
Real numeric NumericVector NumericMatrix - double
Complex complex ComplexVector ComplexMatrix Rcomplex complex
String character CharacterVector (StringVector) CharacterMatrix (StringMatrix) String string
Date Date DateVector - Date -
Datetime POSIXct DatetimeVector - Datetime time_t

1.7.1 data.frame, list, S3, S4

Other than Vector and Matrix, There are several data structures in R such as data.frame, list, S3 class and S4 class. You can handle all of these data structures in Rcpp.

R Rcpp
data.frame DataFrame
list List
S3 class List
S4 class S4

In Rcpp, Vector, DataFrame, List are all implemented as kinds of vectors. Namely, Vector is a vector that its elements are scalar values, DataFrame is a vector that its elements are Vectors, List is a vector that its elements are any kind of data types. Thus, Vector, DataFrame, List has many common member functions in Rcpp.

1.8 Vector

1.8.1 Creating vector object

We can create vector objects in several ways.

// Create a Vector object equivalent to
// v <- rep(0, 3)
NumericVector v (3);

// v <- rep(1, 3)
NumericVector v (3,1);

// v <- c(1,2,3) 
// C++11 Initializer list
NumericVector v = {1,2,3}; 

// v <- c(1,2,3)
NumericVector v = NumericVector::create(1,2,3);

// v <- c(x=1, y=2, z=3)
NumericVector v =
  NumericVector::create(Named("x",1), Named("y")=2 , _["z"]=3);

1.8.2 Accessing vector elements

This Rcpp function demonstrates various ways of accessing and modifying elements of a NumericVector in R using different types of indices (numerical, integer, character, and logical).

// [[Rcpp::export]]
void rcpp_vector_access() {
  // Creating vector
  NumericVector v  {10, 20, 30, 40, 50};

A NumericVector v is created with five numeric elements: {10, 20, 30, 40, 50}.


  // Setting element names
  v.names() = CharacterVector({"A", "B", "C", "D", "E"});

This sets the names for the elements in the NumericVector v. After this, the vector looks like:

A  B  C  D  E 
10 20 30 40 50
  // Preparing vector for access
  NumericVector numeric = {1, 3};
  IntegerVector integer = {1, 3};
  CharacterVector character = {"B", "D"};
  LogicalVector logical = {false, true, false, true, false};

These vectors (numeric, integer, character, and logical) are created for indexing:


  // Getting values of vector elements
  double x1 = v[0];             // Accesses the first element (10)
  double x2 = v["A"];           // Accesses the element with name "A" (also 10)
  NumericVector res1 = v[numeric];    // Gets elements at indices 2 and 4 (20, 40)
  NumericVector res2 = v[integer];    // Same as res1 (20, 40)
  NumericVector res3 = v[character];  // Gets elements named "B" and "D" (20, 40)
  NumericVector res4 = v[logical];    // Gets elements at positions 2 and 4 (20, 40)
  • v[0]: Accesses the first element using numeric indexing (zero-based indexing in C++).
  • v[“A”]: Accesses the element with the name “A”, which corresponds to the first element (10).
  • v[numeric], v[integer], v[character], v[logical]: Accesses multiple elements at once using vectors of different types (numeric, integer, character, logical). All of these access the same elements, 20 and 40, but using different methods of indexing.
Important

when accessing elements from a container like NumericVector in Rcpp, you typically need to declare the type of the variable that will hold the result first.


  // Assigning values to vector elements
  v[0]   = 100;                 // Replaces the first element with 100
  v["A"] = 100;                 // Replaces the element named "A" with 100
  NumericVector v2 {100, 200};   // A new vector {100, 200}
  v[numeric]   = v2;            // Replaces elements at indices 2 and 4 with 100, 200
  v[integer]   = v2;            // Same as above (2nd and 4th elements)
  v[character] = v2;            // Replaces elements named "B" and "D" with 100, 200
  v[logical]   = v2;            // Replaces elements at logical `true` positions (2nd and 4th) with 100, 200
}
  • v[0] = 100: Changes the first element to 100.
  • v["A"] = 100: Changes the element named “A” (which is the same as the first element) to 100.
  • v[numeric] = v2, v[integer] = v2, v[character] = v2, v[logical] = v2: These lines replace the selected elements (based on various indexing methods) with the values from the new vector v2 ({100, 200}).
Important

when modifying elements of a vector or container in Rcpp, you must ensure that the new elements are of the same type as the original container.

1.8.3 Methods

Methods are functions that are attached to an individual object. You can call methods function f() of object v in the form v.f().

NumericVector v = {1,2,3,4,5};

// Calling member function
int n = v.length(); // 5

The vector object in Rcpp has methods functiongs listed below

Method Description
size() Returns the number of elements of this vector object.
names() Returns the element names of this vector object as CharacterVector.
offset(name), findName(name) Returns the numerical index of the element specified by the character string name.
offset(i) Returns the numerical index of the element specified by the numerical index i after bounds check.
fill(x) Fills all the elements of this vector object with scalar value x.
sort() Returns a vector that sorts this vector object in ascending order.
assign(first_it, last_it) Assigns values specified by the iterator first_it and last_it to this vector object.
push_back(x) Appends a scalar value x to the end of this vector object.
push_back(x, name) Appends a scalar value x to the end of this vector object and sets the name of the element as name.
push_front(x) Appends a scalar value x to the front of this vector object.
push_front(x, name) Appends a scalar value x to the front of this vector object and sets the name of the element as name.
begin() Returns an iterator pointing to the first element of this vector object.
end() Returns an iterator pointing to the end (one past the last element) of this vector object.
cbegin() Returns a const iterator pointing to the first element of this vector.
cend() Returns a const iterator pointing to the end (one past the last element) of this vector.
insert(i, x) Inserts a scalar value x at the position specified by the numerical index i.
insert(it, x) Inserts a scalar value x at the position pointed to by the iterator it.
erase(i) Erases the element at the position specified by the numerical index i.
erase(it) Erases the element at the position pointed to by the iterator it.
erase(first_i, last_i) Erases elements from the position specified by numerical index first_i to last_i.
erase(first_it, last_it) Erases elements from the position specified by the iterators first_it to last_it.
containsElementNamed(name) Returns true if this vector contains an element with the name specified by the character string name.

1.8.4 Static methods

A static methods is a function that belongs to a class rather than an instance of the class. This means that you don’t need to create an object (or instance) of the class to call the function; you can call it directly using the class name.

  • Static member functions are called using the class name followed by the ::
  • Example in Rcpp: NumericVector::create(). Here, create() is a static member function of the class NumericVector, which means you can call it directly using the class name (NumericVector), without creating a NumericVector object first.

1.9 Matrix

Matrix objects can be created in several ways.

// Create a Matrix object equivalent to
// m <- matrix(0, nrow=2, ncol=2)
NumericMatrix m1( 2 );

// m <- matrix(0, nrow=2, ncol=3)
NumericMatrix m2( 2 , 3 );

// m <- matrix(v, nrow=2, ncol=3)
NumericMatrix m3( 2 , 3 , v.begin() );

In addition, a matrix object in R is actually a vector that the number of rows and columns are set in the attribute dim.

Thus, if you create a vector with attribute dim in Rcpp and return it to R, it will be treated as a matrix.

mat <- '#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector rcpp_matrix(){
    // Creating a vector object
    NumericVector v = {1,2,3,4};

    // Set the number of rows and columns to attribute dim of the vector object.
    v.attr("dim") = Dimension(2, 2);

    // Return the vector to R
    return v;
}'
sourceCpp(code=mat)
rcpp_matrix()
#>      [,1] [,2]
#> [1,]    1    3
#> [2,]    2    4

Even if you set a value to attribute dim of a Vector object, the type of the object remains a Vector type in Rcpp code. Thus, if you want to convert it to Matrix type in Rcpp, you need to use as<T>() function.

// Set number of rows and columns to attribute dim
v.attr("dim") = Dimension(2, 2);

// Converting to Rcpp Matrix type
NumericMatrix m = as<NumericMatrix>(v);

1.9.1 Accessing to Matrix elements

By using the () operator, you can get from and assign to the values of elements of a Matrix object by specifying its column number and row number.

// Creating a 5x5 numerical matrix
NumericMatrix m( 5, 5 );

// Retrieving the element of row 0 and column 2
double x = m( 0 , 2 );
  • This line retrieves the element at row 0, column 2 of the matrix m and stores it in the variable x.
  • The parentheses m(0, 2) are used for element access in the matrix, similar to m[0, 2] in R.
NumericVector v = m( 0 , _ );
  • This line copies all the values in row 0 (i.e., the entire first row) of the matrix m into the vector v.
  • The underscore _ is a placeholder that represents “all elements” in the corresponding dimension.
NumericVector v = m( _ , 2 );
  • This line copies all the values in column 2 (i.e., the third column) of the matrix m into the vector v.
NumericMatrix m2 = m( Range(0,1), Range(2,3) );
  • This line copies a submatrix of m consisting of the values in rows 0 and 1 (first and second rows) and columns 2 and 3 (third and fourth columns) into a new matrix m2. Range(0,1) specifies that rows 0 and 1 should be selected, and Range(2,3) specifies that columns 2 and 3 should be selected.
m[5]; // This points to the same element as m(0,1)

This line demonstrates linear indexing into the matrix. While NumericMatrix is a two-dimensional structure, it can also be treated like a 1D array when accessed with square brackets [].

1.9.2 Accessing as reference to row, column and sub matrix

Rcpp also provides types that hold “references” to specific parts of a matrix.

  1. Referencing a Column:
NumericMatrix::Column col = m( _ , 1 );
  • m(_, 1): This accesses column 1 of the matrix m (zero-based indexing, so it refers to the second column).
  • NumericMatrix::Column: This type is a reference to a specific column of the matrix. It doesn’t create a copy of the column but rather provides direct access to the elements in column 1.
  1. Referencing a Row:
NumericMatrix::Row row = m( 1 , _ );
  • m(1,_): Thisaccesses row 1 of the matrix m
  • NumericMatrix::Row: This type is a reference to a specific row of the matrix.
  1. Referencing a Submatrix
NumericMatrix::Sub sub = m( Range(0,1) , Range(2,3) );
  • Range(0,1): This specifies the range of rows to select (rows 0 and 1, the first two rows).
  • Range(2,3): This specifies the range of columns to select (columns 2 and 3, the third and fourth columns).
  • NumericMatrix::Sub: This type is a reference to a submatrix of m. It references the elements in the submatrix defined by the row and column ranges.

1.9.3 Methods

Since Matrix is actually Vector, Matrix basically has the same member functions as Vector. Thus, member functions unique to Matrix are only presented below.

Method Description
nrow() or rows() Returns the number of rows.
ncol() or cols() Returns the number of columns.
row(i) Returns a reference Vector::Row to the i-th row.
column(i) Returns a reference Vector::Column to the i-th column.
fill_diag(x) Fills the diagonal elements with the scalar value x.
offset(i, j) Returns the numerical index in the original vector of the matrix corresponding to the element at row i and column j.

1.9.4 Static menmber functions

Matrix basically has the same static member function as Vector. The static member functions unique to Matrix are shown below. Matrix::diag( size, x ): Returns a diagonal matrix whose number of rows and columns equals to “size” and the value of the diagonal element is “x”.

cppFunction("
NumericMatrix create_diag_matrix(int size, double x) {
    // Create a diagonal matrix of the given size with x on the diagonal
    NumericMatrix mat = NumericMatrix::diag(size, x);
    return mat;
}")
create_diag_matrix(5,2)
Important

x is a scalar value that will be placed along the diagonal of the matrix.

1.10 Vector operations

1.10.1 Arithmetic operations

By using the + - * / operator you can perform elementwise arithmetic operations between vectors of the same length.

NumericVector x ;
NumericVector y ;

Vector and vector operation

// Vector and vector operation
NumericVector res = x + y ;
NumericVector res = x - y ;
NumericVector res = x * y ;
NumericVector res = x / y ;

Vector and scalar operation

// Vector and scalar operation
NumericVector res = x   + 2.0 ;
NumericVector res = 2.0 - x;
NumericVector res = y   * 2.0 ;
NumericVector res = 2.0 / y;

Expression and expression operation

NumericVector res = x * y + y / 2.0 ;
NumericVector res = x * ( y - 2.0 ) ;
NumericVector res = x / ( y * y ) ;

The - operator inverts the sign.

NumericVector res = -x ;

1.10.2 Comparison operations

Comparison of vectors using == ! = < > >= <= operators produces logical vectors. You can also access vector elements using logical vectors.

Comparison of vector and vector

LogicalVector res = x < y ;
LogicalVector res = x > y ;
LogicalVector res = x <= y ;
LogicalVector res = x >= y ;
LogicalVector res = x == y ;
LogicalVector res = x != y ;

Comparison of vector and scalar

LogicalVector res = x < 2 ;
LogicalVector res = 2 > x;
LogicalVector res = y <= 2 ;
LogicalVector res = 2 != y;

Comparison of expression and expression

LogicalVector res = ( x + y ) < ( x*x ) ;
LogicalVector res = ( x + y ) >= ( x*x ) ;
LogicalVector res = ( x + y ) == ( x*x ) ;

!(...): The logical NOT operator ! negates the result of the comparison. In other words, it turns TRUE into FALSE and FALSE into TRUE.

LogicalVector res = !(x < y);

Accessing the elements of the vector using logical vectors.

NumericVector res = x[x < 2];

1.11 Logical operations

1.11.1 Logical Vector

Although in C++ the boolean type (bool) has only two possible values, true (1) and false (0), R’s logical vectors have a third possible value: NA (missing or undefined). Because C++ bool can’t represent this third state, Rcpp uses integers to represent the elements of R’s LogicalVector.

In Rcpp, elements of a LogicalVector are stored as integers to accommodate the extra NA value. Specifically, these values are represented as: + TRUE: 1 (same as C++ true) + FALSE: 0 (same as C++ false) + NA: NA_LOGICAL, which is a special constant defined as the minimum value of an integer: -2147483648 (the smallest value for a 32-bit signed integer).

1.11.2 Logical operations

Use the operator & (logical product) | (logical sum) ! (Logical negation) for the logical operation for each element of LogicalVector.

LogicalVector v1 = {1,1,0,0};
LogicalVector v2 = {1,0,1,0};

LogicalVector res1 = v1 & v2;
LogicalVector res2 = v1 | v2;
LogicalVector res3 = !(v1 | v2);

Rcout << res1 << "\n"; // 1 0 0 0
Rcout << res2 << "\n"; // 1 1 1 0
Rcout << res3 << "\n"; // 0 0 0 1

1.11.3 Function that receives LogicalVector

1.11.3.1 all() and any()

Examples of functions that receive LogicalVector are all(), any() and ifelse().

  • all(v) returns TRUE when all elements of v are TRUE, and any(v) returns TRUE if any of v’s elements are TRUE.

In Rcpp, the return type of both all() and any() is not a simple bool, but a more complex type called SingleLogicalResult.

This type can represent not only TRUE or FALSE, but also NA (the third possible logical value in R). As a result, the return value of all() or any() cannot be directly used in a conditional statement like an if statement in C++.

To convert the SingleLogicalResult from all() or any() into a bool, Rcpp provides helper functions: + is_true(): Returns true if the result is TRUE. + is_false(): Returns true if the result is FALSE. + is_na(): Returns true if the result is NA.

1.11.3.2 ifelse()

ifelse (v, x1, x2) receives the logical vector v, and returns the corresponding element of x1 when the element of v is TRUE and the corresponding element of x2 when it is FALSE.

x1 and x2 must either be scalars or vectors. If they are vectors, their length must match the length of v. This ensures that there is a corresponding element in x1 or x2 for each element in v.

NumericVector v1;
NumericVector v2;
//Number of elements of vector
int n = v1.length();

In case, both x1 and x2 are scalar

IntegerVector res1     = ifelse( v1>v2, 1, 0);
NumericVector res2     = ifelse( v1>v2, 1.0, 0.0);
Important

Since ifelse() does not work with a scalar character string, we need to use a string vector whose values of elements are all the same.

CharacterVector chr_v1 = rep(CharacterVector("T"), n);
CharacterVector chr_v2 = rep(CharacterVector("F"), n);
CharacterVector res3   = ifelse( v1>v2, chr_v1, chr_v2);

In case, x1 and x2 are vector and scalar

IntegerVector res4 = ifelse(v1 > v2, int_v1, 0);
NumericVector res5 = ifelse(v1 > v2, num_v1, 0.0);
CharacterVector res6 = ifelse(v1 > v2, chr_v1, Rf_mkChar("F")); // Note
  • For integer and numeric vectors, the scalar values (0 and 0.0) are recycled as needed
  • For character vectors, you cannot directly use a scalar string like "F"; instead, you use Rf_mkChar("F"), which creates an internal SEXP (R object) representing the string “F”.

1.12 Data frame

1.12.1 Creating DataFrame

In Rcpp, DataFrame is implemented as a kind of vector. In other words, Vector is a vector whose element is scalar value, while DataFrame is a vector whose elements are Vectors of the same length.

DataFrame::create() is used to create a DataFrame object.

// Creating DataFrame df from Vector v1, v2
DataFrame df = DataFrame::create(v1, v2);

Also, use Named() or _[] if you want to specify column names when creating DataFrame object.

// When giving names to columns
DataFrame df = DataFrame::create( Named("V1") = v1 , _["V2"] = v2 );
Warning

When you create a DataFrame with DataFrame::create(), the value of the originalVector element will not be duplicated in the columns of the DataFrame, but the columns will be the “reference” to the original Vector. Therefore, changing the value of the original Vector changes the value of the columns. To avoid this, use clone().

df <- '#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
DataFrame rcpp_df(){
  // Creating vector v
  NumericVector v = {1,2};
  // Creating DataFrame df
  DataFrame df = DataFrame::create( Named("V1") = v,         // simple assign
                                    Named("V2") = clone(v)); // using clone()
  // Changing vector v
  v = v * 2;
  return df;
}'
sourceCpp(code=df)
rcpp_df()

1.12.2 Accessing DataFrame elements

When you access a specific column of a DataFrame in Rcpp, that column is temporarily assigned to a Vector object. The Vector object allows you to manipulate or access the values of that column as if it were a separate vector.

Warning

As with DataFrame creation, assigning aDataFrame column to Vector in the above way will not copy the column value to Vector object, but it will be a “reference” to the column. Therefore, when you change the values of Vector object, the content of the column will also be changed.

1.12.2.1 Accessing by Numeric, String, or Logical Vectors

You can specify which column of the DataFrame you want to access using different types of vectors:

  • Numeric Vector (Column Number):You can specify a column by its index (0-based). Example: df[0] would return the first column.

  • String Vector (Column Name):You can access a column by its name using a string, which is more intuitive and readable when dealing with named columns.

  • Logical Vector: Each element of the logical vector corresponds to whether a column should be included (TRUE) or excluded (FALSE). The length of the logical vector must match the number of columns in the data frame.

1.12.3 Member functions

Method Description
nrows() Returns the number of rows.
ncol() Returns the number of columns.
length() Returns the number of columns.
size() Returns the number of columns.
names() Returns the column names as a CharacterVector.
offset(name) or findName(name) Returns the numerical index of the column with the name specified by the string name.
fill(v) Fills all the columns of this DataFrame with the Vector v.
assign(first_it, last_it) Assigns columns in the range specified by the iterators first_it and last_it to this DataFrame.
push_back(v) Adds Vector v to the end of the DataFrame.
push_back(v, name) Appends a Vector v to the end of the DataFrame and specifies the name of the added column with the string name.
push_front(x) Appends a Vector v at the beginning of this DataFrame.
push_front(x, name) Appends a Vector v at the beginning of this DataFrame and specifies the name of the added column with the string name.
begin() Returns an iterator pointing to the first column of this DataFrame.
end() Returns an iterator pointing to the end of this DataFrame.
insert(it, v) Adds Vector v to this DataFrame at the position pointed by the iterator it and returns an iterator to that element.
erase(i) Deletes the i-th column of this DataFrame and returns an iterator to the column just after the erased column.
erase(it) Deletes the column specified by the iterator it and returns an iterator to the column just after the erased column.
erase(first_i, last_i) Deletes columns from first_i to last_i - 1 and returns an iterator to the column just after the erased columns.
erase(first_it, last_it) Deletes the range of columns specified by the iterators first_it to last_it - 1 and returns an iterator to the column just after the erased columns.
containsElementNamed(name) Returns true if this DataFrame has a column with the name specified by the string name.
inherits(str) Returns true if the attribute “class” of this object contains the string str.

1.13 List

In Rcpp, List is implemented as a kind of vector. In other words, Vector is a vector whose element is scalar value, while List is a vector whose elements are any kinds of data types.

1.13.1 Creating List object

To create a List object we use the List::create() function. Also, to specify the element name when creating List, use Named() function or _[].

// Create list L from vector v1, v2
List L = List::create(v1, v2);

// When giving names to elements
List L = List::create(Named("name1") = v1 , _["name2"] = v2);
Warning

When you create a List with DataFrame::create(), the value of the originalVector element will not be duplicated but the elements will be the “reference” to the original Vector. Therefore, changing the value of the original Vector changes the value of the list. To avoid this, use clone().

1.13.2 Accessing List elements

When accessing a specific element of List, we assign it to the other object and access it via that object.

The elements of List can be specified by numerical index, element names and logical vector.

1.13.3 Member functions

List has the same member functions as Vector

1.14 S3``S4 class

The S3 class is actually a list whose attribute class has its own value.

We use an example to demonstrate how to use it in Rcpp

double rcpp_rmse(List lm_model) {
    // Since S3 is a list, data type of the argument is specified as List.

S3 objects in R, such as objects created by lm(), are typically lists with an additional class attribute. Since an S3 object is fundamentally a list, in Rcpp, we can use the List data type to receive it.

if (! lm_model.inherits("lm")) stop("Input must be a lm() model object.");

The inherits("lm") function checks whether the input object (the list) belongs to the lm class. This ensures that the input object is an lm object.

    // Extracting residuals (i.e. actual - prediction) from the S3 object
    NumericVector resid  = lm_model["residuals"];
  • In an lm object, residuals are stored in the “residuals” component.
  • Since the S3 object is a list, we can access the “residuals” component using list-like indexing (lm_model[“residuals”]), which extracts the residuals as a NumericVector in Rcpp.
    // Number of elements of the residual vector
    R_xlen_t n = resid.length();
  • The length() function is used to get the number of elements in the resid vector (i.e., the number of residuals).
  • R_xlen_t is a type that represents the length of vectors in R. It is typically used in Rcpp when dealing with vector lengths.

R_xlen_t is large enough to handle very long vectors, which is crucial when you’re working with vectors whose length might exceed the range of a standard integer (which is 2^31 - 1 in R, roughly 2 billion).

    // The sum of squares of the residual vector
    double rmse(0.0);
    for(double& r : resid){
        rmse += r*r;
    }

    // Divide the residual sum of squares by the number of elements and take the square root
    return(sqrt((1.0/n)*rmse));
}

1.14.1 S4 class

wait to update.

1.15 String

String is a scalar type corresponding to the element of CharacterVector. String can also handle NA values (NA_STRING) which are not supported by the C character string char* and the C++ string std::string.

1.15.1 Creating String object

There are three main ways to create a String object. Each approach reflects different sources for creating the String object:

  1. Creating a String from a C/C++ String (Character Pointer or std::string):
// Create String from C-style string
String s1 = "Hello, world!";

// Create String from std::string
std::string cpp_str = "C++ string";
String s2 = cpp_str;
  • s1 is created from a C-style string ("Hello, world!").
  • s2 is created from a C++ string (std::string).
  1. Creating a String from Another String Object:
  • You can create a new String object by copying an existing String object in Rcpp.
String s1 = "Original string";

// Create a new String from another String object
String s2 = s1;
  1. Creating a String from an Element of a CharacterVector:
CharacterVector cv = CharacterVector::create("one", "two", "three");

// Create a String from the second element of the CharacterVector
String s = cv[1];  // Note: 0-based indexing, so this accesses "two"

1.15.2 Operators

In Rcpp, the String class supports the += operator, which allows you to append another string (or string-like object) to an existing String object.

// Creating String object
String s("A");

// Conbining a string
s += "B";

Rcout << s << "\n"; 
// "AB"

1.15.3 Member functions

Method Description
replace_first(str, new_str) Replace the first substring that matches the string str with the string new_str.
replace_last(str, new_str) Replace the last substring that matches the string str with the string new_str.
replace_all(str, new_str) Replace all substrings that match the string str with the string new_str.
push_back(str) Combine the string str to the end of this String object. (Same as += operator)
push_front(str) Combine the string str at the beginning of this String object.
set_na() Set NA value to this String object.
get_cstring() Convert the string of this String object into a C character string constant (const char*) and return it.
get_encoding() Returns the character encoding. The encoding is represented by cettype_t.
set_encoding(enc) Set the character encoding specified by cettype_t.

1.16 Date and DateVector

1.16.1 Creating Date objects

The following codes illustrates different ways to create a Date object in Rcpp, representing a specific date.

Date d;  // "1970-01-01"
  • Date d; creates a Date object d that represents the epoch date “1970-01-01”, which is considered the default starting point for dates in many computing systems, including R.
Date d(1);  // "1970-01-01" + 1 day
  • Date d(1); creates a Date object d that represents one day after the epoch date.
Date d(1.1);  // "1970-01-01" + ceil(1.1) day
  • Date d(1.1); creates a Date object d that represents 1.1 days after “1970-01-01”. The number is rounded up (using ceil()), so it is treated as 2 days after “1970-01-01”.
Date d("2000-01-01", "%Y-%m-%d");  // Date specified by a string with a format

This creates a Date object d from a string "2000-01-01" with the format "%Y-%m-%d"

Date from Day, Month, and Year (mm/dd/yyyy):

Date d(1, 2, 2000);  // 2000-01-02 (mon, day, year)

1.16.2 Operators

Date has operators +, -, <, >, >=, <=, ==, !=.

By using these operators, you can perform addition of days (+), difference calculation of date (-), and comparison of dates (<, <=, >, >=, ==, !=) . ### Member functions

Method Description
format() Returns the date as a std::string using the same specification as base R. The default format is YYYY-MM-DD.
getDay() Returns the day of the date.
getMonth() Returns the month of the date.
getYear() Returns the year of the date.
getWeekday() Returns the day of the week as an int. (1:Sun, 2:Mon, 3:Tue, 4:Wed, 5:Thu, 6:Sat).
getYearday() Returns the number of the date through the year (with January 1st as 1 and December 31st as 365).
is_na() Returns true if this object is NA.

1.16.3 DateVector subsetting

In Rcpp, both DateVector and DateTimeVector are internally stored as numeric types (specifically, doubles). This design simplifies certain internal calculations but can be confusing when working with individual elements of these vectors.

This behavior is important because when you subset a DateVector using the [] operator, you extract a double, which represents the date as the number of days since the epoch date (1970-01-01).

To work with individual Date or DateTime objects from a DateVector or DateTimeVector, you need to explicitly cast or convert the extracted element back into a Date or DateTime object.

Example:

// [[Rcpp::export]]
void print_year_of_date(DateVector dates) {
    for (int i = 0; i < dates.size(); ++i) {
        // Convert the extracted double to a Date object
        Date single_date = dates[i];  // Convert double to Date
        int year = single_date.getYear();  // Now you can call getYear()
        Rcpp::Rcout << "Year: " << year << std::endl;
    }
}
  • dates[i] returns a double by default.
  • Date single_date = dates[i];: We explicitly convert the double value to a Date object, allowing us to use the getYear() method.

1.17 RObject

The RObject type in Rcpp is a flexible and general-purpose type that can represent any kind of R object.

Here is an example that demonstrates how RObject can be used in Rcpp to accept and handle different types of R objects.

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
void handle_any_type(RObject obj) {
    // Check if the object is a NumericVector
    if (obj.is<NumericVector>()) {
        NumericVector num_vec = as<NumericVector>(obj);
        Rcpp::Rcout << "Numeric Vector: " << num_vec << std::endl;
    } 
    // Check if the object is a CharacterVector
    else if (obj.is<CharacterVector>()) {
        CharacterVector char_vec = as<CharacterVector>(obj);
        Rcpp::Rcout << "Character Vector: " << char_vec << std::endl;
    } 
    // Check if the object is a List
    else if (obj.is<List>()) {
        List lst = as<List>(obj);
        Rcpp::Rcout << "List: " << lst << std::endl;
    } 
    // Handle unknown types
    else {
        Rcpp::Rcout << "Unknown type!" << std::endl;
    }
}

1.17.1 Cnversion using as<>():

  • as<>() is a template function in Rcpp that allows you to convert an RObject to a more specific type when you know the type of the object or have determined it dynamically.
NumericVector num_vec = as<NumericVector>(obj);  // Convert RObject to NumericVector

This converts the RObject into a NumericVector when you are sure that it contains a numeric vector.

1.18 Cautions in handling Rcpp objects

1.18.1 Assigning between vectors

In Rcpp, when you assign an object (like a vector, list, or matrix) v1 to another object v2 using the = operator (e.g., v2 = v1;), no deep copy is made. Instead, v2 becomes an alias to v1, meaning that both v1 and v2 point to the same underlying data in memory.

If you want v2 to be a completely independent copy of v1, so that changes to v1 do not affect v2, you need to perform a deep copy. In Rcpp, you can use the clone() function to create a deep copy.

1.18.2 Data type of numerical index

You should use R_xlen_t as data type for numerical index or the number of elements to support long vector in your Rcpp code.

1.18.3 Return type of operator []

In Rcpp, when you access an element of a vector with [] or (), you do not directly get the element as its native type (e.g., double, int, or String). Instead, you get a Vector::Proxy object. This Proxy object acts as an intermediary that allows you to modify the vector element directly or retrieve its value, but it is not the same as the element’s actual type.

To resolve this, you either: + Assign v[i] to a new object of the expected type. + Convert the Proxy to the native type using as<T>().

1.19 R-like functions

Here is a list of Rcpp functions similar to R functions.

If you know for certain that your vector does not contain any NA values, you can optimize your code by using the noNA() function. noNA() marks the vector as guaranteed to be free of NA values, which allows Rcpp to skip NA checks and perform calculations more efficiently.

List is too long to illustrate.

1.20 Probability distribution

In Rcpp, probability distribution functions exist in two different namespaces:

  1. Rcpp:: namespace:
  • Functions in this namespace return vectors.

  • These functions are designed to be similar to their counterparts in base R. You can pass a vector of values to these functions and they will return a vector of results.

  1. R:: namespace:
  • Functions in this namespace return scalar values (a single value).
  • If you only need a single value from the distribution function, using the R:: version of the function can be more efficient because it avoids the overhead of vectorization.
sort_cpp <- '#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector sort_numeric_vector(NumericVector x) {
    std::sort(x.begin(), x.end());
    return x;
}'
sourceCpp(code=sort_cpp)