#include<Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
(ARGUMENT_TYPE ARGUMENT){
RETURN_TYPE FUNCTION_NAME
//do something
return RETURN_VALUE;
}
1 Rcpp
1.1 Format for defining a function in Rcpp
The following code shows the basic format for defining a Rcpp function.
#include<Rcpp.h>
: This sentence enables us to use classes and functions defined by the Rcpp packageusing namespace Rcpp;
: this sentence enables us to use them directly, other wise you need to declare that we use these function from package Rcpp by, e.g.Rcpp::NumericVector
.
// [[Rcpp::export]]
:The function defined just below this sentence will be accessible from R.RETURN_TYPE FUNCTION_NAME(ARGUMENT_TYPE ARGUMENT){}
:We need to specify data types of functions and arguments.
1.2 Compiling the code
The function Rcpp::sourceCpp()
will compile your source code and load the defined function into R. The below code defines a function that calculates the sum of a vector.
//sum.cpp
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
double rcpp_sum(NumericVector v){
double sum = 0;
for(int i=0; i<v.length(); ++i){
+= v[i];
sum }
return(sum);
}
Now we can load it into R by
library(Rcpp)
sourceCpp('sum.cpp')
1.3 Executing the function
We can use our Rcpp functions as usual R functions
rcpp_sum(1:10)
#> [1] 55
1.4 Embedding Rcpp code in our R code
We can write Rcpp code in our R code in 3 ways.
1.4.1 sourceCpp()
Save Rcpp code as string object in R and compile it with sourceCpp()
<-
src "#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
double rcpp_sum(NumericVector v){
double sum = 0;
for(int i=0; i<v.length(); ++i){
sum += v[i];
}
return(sum);
}"
sourceCpp(code = src)
rcpp_sum(1:10)
#> [1] 55
1.4.2 cppFunction()
We can omit #include <Rcpp.h>
and using namespase Rcpp
when we use cppFunction()
.
<-"double rcpp_sum(NumericVector v){
src double sum = 0;
for(int i=0; i<v.length(); ++i){
sum += v[i];
}
return(sum);
}
"
::cppFunction(src)
Rcpprcpp_sum(1:10)
#> [1] 55
1.4.3 evalCpp()
You can evaluate a single C++ statement by using evalCpp()
.
# Showing maximum value of double.
evalCpp('std::numeric_limits<double>::max()')
#> [1] 1.797693e+308
1.5 C++ 11
C++ 11 is a standard of C++ newly established in 2011. Many new features have been added to make C++ even easier for beginners.
The code examples in this document are written with C++11 enabled.
1.5.1 Enabling C++11
To enable C++11
, add the following decription in our Rcpp code. > // [[Rcpp::plugins("cpp11")]]
1.5.2 Recommended C++11 features
1.5.2.1 Initializer list
// Initialize Vector
// The next three are the same as c (1, 2, 3).
= NumericVector::create(1.0, 2.0, 3.0);
NumericVector v1 = {1.0, 2.0, 3.0};
NumericVector v2 {1.0, 2.0, 3.0}; // You can omit "=". NumericVector v3
1.5.2.2 decltype
By using decltype, you can declare a variable of the same type as an existing variable.
int i;
decltype(i) x; // variable "x" will be int
1.5.2.3 Range-vased for-loop
We can write a for statement with the same style as R.
{1,2,3};
IntegerVector v int sum=0;
for(auto& x : v) {
+= x;
sum }
auto& x
: Theauto&
keyword means that each element in the vector v will be referenced as x. Using & allows direct access to each element (by reference), but here it’s primarily for efficiency, so no copy of each element is made.
1.5.2.4 Lambda expression
It is a function object created as an unnamed function and passed to the other function.
Lambda expressions are written in the form [](){}
[]
specifies how the lambda should access variables from the surrounding scope. Here are the different options:
[]
: Capture nothing. The lambda function cannot access any local variables from the surrounding scope.[=]
: Capture all local variables by value. This means a copy of each variable is made, and the lambda works with the copy.[&]
: Capture all local variables by reference. The lambda can modify the original variables because it’s accessing them directly.[=x, &y]: Capture specific variables in different ways. In this example, the variable x will be captured by value (copied), and y will be captured by reference (accessed directly).
The return type of this function object is automatically set to the type of the returned value described in {}
. If you want to define return type explicitly, write it like []()->int{}
.
Examples
R example
<- c(1,2,3,4,5)
v <- 2.0
A sapply(v, function(x){A*x})
#> [1] 2 4 6 8 10
Rcpp example We save it as lambda.cpp
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::plugins("cpp11")]]
// [[Rcpp::export]]
(){
NumericVector rcpp_lambda_1= {1,2,3,4,5};
NumericVector v double A = 2.0;
=
NumericVector res (v, [&](double x){return A*x;});
sapplyreturn res;
}
sourceCpp('lambda.cpp')
rcpp_lambda_1()
#> [1] 2 4 6 8 10
1.6 Printing Messages
1.6.1 Rcout
, Rcerr
// [[Rcpp::export]]
void rcpp_rcout(NumericVector v){
// printing value of vector
<< "The value of v : " << v << "\n";
Rcout
// printing error message
<< "Error message\n";
Rcerr }
This line declares a function named
rcpp_rcout
that takes a NumericVector as an argument and returns nothing (void).Rcout is an Rcpp-specific object similar to the standard C++
std::cout
, but it is used for printing messages to the R console. It uses<<
to concatenate output.Rcerr is similar to the standard C++
std:cerr
, which is used for error messages and diagnostic output. Rcerr is used to send error or warning messages in R, so output sent through Rcerr might be displayed differently (in red text, depending on the R environment) to indicate an issue or warning.
1.6.2 Rprintf()
, REprintf()
These functions are used to print formatted output to the R console. It allows you to format the output using placeholders (e.g., %d
for integers, %f
for floats).
Syntax:
( format, variables) Rprintf
The first argument is a format string, and subsequent arguments are the values to be inserted into the formatted string. Some format specifier is presented below:
specifier | explanation |
---|---|
%i |
printing signed integer (int ) |
%u |
printing unsigned integer (unsigned int ) |
%f |
printing floating point number (double ) |
%e |
printing floating point number (double ) in exponential style |
%s |
printing C string (char* ) |
Additionally,
Rprintf()
andREprintf()
can only print data types that exist in standard C++ language, thus you cannot pass data types defined by Rcpp package (such as NumericVector) to Rprintf() directly. If you want to print the values of elements of an Rcpp vector using Rprintf(), you have to pass each element separately to it (see below).
// [[Rcpp::export]]
void rcpp_rprintf(NumericVector v){
// printing values of all the elements of Rcpp vector
for(int i=0; i<v.length(); ++i){
("the value of v[%i] : %f \n", i, v[i]);
Rprintf}
}
1.7 Data Types
All the basic data types and data structures provided by R are available in Rcpp. By using these data types, you can directly access the objects that exist in R. ### Vector and Matrix Following seven data types are often used in R.
logical
integer
numeric
complex
character
Date
POSIXct
### Vector and matrix There are vector type and matrix types in Rcpp corresponding to those of R.
The table below presents the correspondence of data types between R/Rcpp/C++.
Value | R vector | Rcpp vector | Rcpp matrix | Rcpp scalar | C++ scalar |
---|---|---|---|---|---|
Logical | logical |
LogicalVector |
LogicalMatrix |
- | bool |
Integer | integer |
IntegerVector |
IntegerMatrix |
- | int |
Real | numeric |
NumericVector |
NumericMatrix |
- | double |
Complex | complex |
ComplexVector |
ComplexMatrix |
Rcomplex |
complex |
String | character |
CharacterVector (StringVector ) |
CharacterMatrix (StringMatrix ) |
String |
string |
Date | Date |
DateVector |
- | Date |
- |
Datetime | POSIXct |
DatetimeVector |
- | Datetime |
time_t |
1.7.1 data.frame, list, S3, S4
Other than Vector
and Matrix
, There are several data structures in R such as data.frame
, list
, S3 class
and S4 class
. You can handle all of these data structures in Rcpp.
R | Rcpp |
---|---|
data.frame |
DataFrame |
list |
List |
S3 class | List |
S4 class | S4 |
In Rcpp, Vector, DataFrame, List are all implemented as kinds of vectors. Namely, Vector is a vector that its elements are scalar values, DataFrame is a vector that its elements are Vectors, List is a vector that its elements are any kind of data types. Thus, Vector, DataFrame, List has many common member functions in Rcpp.
1.8 Vector
1.8.1 Creating vector object
We can create vector objects in several ways.
// Create a Vector object equivalent to
// v <- rep(0, 3)
(3);
NumericVector v
// v <- rep(1, 3)
(3,1);
NumericVector v
// v <- c(1,2,3)
// C++11 Initializer list
= {1,2,3};
NumericVector v
// v <- c(1,2,3)
= NumericVector::create(1,2,3);
NumericVector v
// v <- c(x=1, y=2, z=3)
=
NumericVector v ::create(Named("x",1), Named("y")=2 , _["z"]=3); NumericVector
1.8.2 Accessing vector elements
This Rcpp function demonstrates various ways of accessing and modifying elements of a NumericVector in R using different types of indices (numerical, integer, character, and logical).
// [[Rcpp::export]]
void rcpp_vector_access() {
// Creating vector
{10, 20, 30, 40, 50}; NumericVector v
A NumericVector v is created with five numeric elements: {10, 20, 30, 40, 50}.
// Setting element names
.names() = CharacterVector({"A", "B", "C", "D", "E"}); v
This sets the names for the elements in the NumericVector v. After this, the vector looks like:
A B C D E 10 20 30 40 50
// Preparing vector for access
= {1, 3};
NumericVector numeric = {1, 3};
IntegerVector integer = {"B", "D"};
CharacterVector character = {false, true, false, true, false}; LogicalVector logical
These vectors (numeric, integer, character, and logical) are created for indexing:
// Getting values of vector elements
= v[0]; // Accesses the first element (10)
double x1 = v["A"]; // Accesses the element with name "A" (also 10)
double x2 = v[numeric]; // Gets elements at indices 2 and 4 (20, 40)
NumericVector res1 = v[integer]; // Same as res1 (20, 40)
NumericVector res2 = v[character]; // Gets elements named "B" and "D" (20, 40)
NumericVector res3 = v[logical]; // Gets elements at positions 2 and 4 (20, 40) NumericVector res4
- v[0]: Accesses the first element using numeric indexing (zero-based indexing in C++).
- v[“A”]: Accesses the element with the name “A”, which corresponds to the first element (10).
- v[numeric], v[integer], v[character], v[logical]: Accesses multiple elements at once using vectors of different types (numeric, integer, character, logical). All of these access the same elements, 20 and 40, but using different methods of indexing.
when accessing elements from a container like NumericVector in Rcpp, you typically need to declare the type of the variable that will hold the result first.
// Assigning values to vector elements
0] = 100; // Replaces the first element with 100
v["A"] = 100; // Replaces the element named "A" with 100
v[100, 200}; // A new vector {100, 200}
NumericVector v2 {= v2; // Replaces elements at indices 2 and 4 with 100, 200
v[numeric] = v2; // Same as above (2nd and 4th elements)
v[integer] = v2; // Replaces elements named "B" and "D" with 100, 200
v[character] = v2; // Replaces elements at logical `true` positions (2nd and 4th) with 100, 200
v[logical] }
v[0] = 100
: Changes the first element to 100.v["A"] = 100
: Changes the element named “A” (which is the same as the first element) to 100.v[numeric] = v2
,v[integer] = v2
,v[character] = v2
,v[logical] = v2
: These lines replace the selected elements (based on various indexing methods) with the values from the new vector v2 ({100, 200}).
when modifying elements of a vector or container in Rcpp, you must ensure that the new elements are of the same type as the original container.
1.8.3 Methods
Methods are functions that are attached to an individual object. You can call methods function f()
of object v
in the form v.f()
.
= {1,2,3,4,5};
NumericVector v
// Calling member function
= v.length(); // 5 int n
The vector
object in Rcpp has methods functiongs listed below
Method | Description |
---|---|
size() |
Returns the number of elements of this vector object. |
names() |
Returns the element names of this vector object as CharacterVector . |
offset(name) , findName(name) |
Returns the numerical index of the element specified by the character string name . |
offset(i) |
Returns the numerical index of the element specified by the numerical index i after bounds check. |
fill(x) |
Fills all the elements of this vector object with scalar value x . |
sort() |
Returns a vector that sorts this vector object in ascending order. |
assign(first_it, last_it) |
Assigns values specified by the iterator first_it and last_it to this vector object. |
push_back(x) |
Appends a scalar value x to the end of this vector object. |
push_back(x, name) |
Appends a scalar value x to the end of this vector object and sets the name of the element as name . |
push_front(x) |
Appends a scalar value x to the front of this vector object. |
push_front(x, name) |
Appends a scalar value x to the front of this vector object and sets the name of the element as name . |
begin() |
Returns an iterator pointing to the first element of this vector object. |
end() |
Returns an iterator pointing to the end (one past the last element) of this vector object. |
cbegin() |
Returns a const iterator pointing to the first element of this vector. |
cend() |
Returns a const iterator pointing to the end (one past the last element) of this vector. |
insert(i, x) |
Inserts a scalar value x at the position specified by the numerical index i . |
insert(it, x) |
Inserts a scalar value x at the position pointed to by the iterator it . |
erase(i) |
Erases the element at the position specified by the numerical index i . |
erase(it) |
Erases the element at the position pointed to by the iterator it . |
erase(first_i, last_i) |
Erases elements from the position specified by numerical index first_i to last_i . |
erase(first_it, last_it) |
Erases elements from the position specified by the iterators first_it to last_it . |
containsElementNamed(name) |
Returns true if this vector contains an element with the name specified by the character string name . |
1.8.4 Static methods
A static methods is a function that belongs to a class rather than an instance of the class. This means that you don’t need to create an object (or instance) of the class to call the function; you can call it directly using the class name.
- Static member functions are called using the class name followed by the ::
- Example in Rcpp: NumericVector::create(). Here, create() is a static member function of the class NumericVector, which means you can call it directly using the class name (NumericVector), without creating a NumericVector object first.
1.9 Matrix
Matrix
objects can be created in several ways.
// Create a Matrix object equivalent to
// m <- matrix(0, nrow=2, ncol=2)
m1( 2 );
NumericMatrix
// m <- matrix(0, nrow=2, ncol=3)
m2( 2 , 3 );
NumericMatrix
// m <- matrix(v, nrow=2, ncol=3)
m3( 2 , 3 , v.begin() ); NumericMatrix
In addition, a matrix object in R is actually a vector that the number of rows and columns are set in the attribute dim
.
Thus, if you create a vector with attribute dim
in Rcpp and return it to R, it will be treated as a matrix.
<- '#include <Rcpp.h>
mat using namespace Rcpp;
// [[Rcpp::export]]
NumericVector rcpp_matrix(){
// Creating a vector object
NumericVector v = {1,2,3,4};
// Set the number of rows and columns to attribute dim of the vector object.
v.attr("dim") = Dimension(2, 2);
// Return the vector to R
return v;
}'
sourceCpp(code=mat)
rcpp_matrix()
#> [,1] [,2]
#> [1,] 1 3
#> [2,] 2 4
Even if you set a value to attribute
dim
of a Vector object, the type of the object remains aVector
type in Rcpp code. Thus, if you want to convert it toMatrix
type in Rcpp, you need to useas<T>()
function.
// Set number of rows and columns to attribute dim
v.attr("dim") = Dimension(2, 2);
// Converting to Rcpp Matrix type
= as<NumericMatrix>(v); NumericMatrix m
1.9.1 Accessing to Matrix elements
By using the ()
operator, you can get from and assign to the values of elements of a Matrix object by specifying its column number and row number.
// Creating a 5x5 numerical matrix
m( 5, 5 );
NumericMatrix
// Retrieving the element of row 0 and column 2
= m( 0 , 2 ); double x
- This line retrieves the element at row 0, column 2 of the matrix m and stores it in the variable x.
- The parentheses m(0, 2) are used for element access in the matrix, similar to m[0, 2] in R.
= m( 0 , _ ); NumericVector v
- This line copies all the values in row 0 (i.e., the entire first row) of the matrix m into the vector v.
- The underscore _ is a placeholder that represents “all elements” in the corresponding dimension.
= m( _ , 2 ); NumericVector v
- This line copies all the values in column 2 (i.e., the third column) of the matrix m into the vector v.
= m( Range(0,1), Range(2,3) ); NumericMatrix m2
- This line copies a submatrix of m consisting of the values in rows 0 and 1 (first and second rows) and columns 2 and 3 (third and fourth columns) into a new matrix m2.
Range(0,1)
specifies that rows 0 and 1 should be selected, andRange(2,3)
specifies that columns 2 and 3 should be selected.
5]; // This points to the same element as m(0,1) m[
This line demonstrates linear indexing into the matrix. While NumericMatrix is a two-dimensional structure, it can also be treated like a 1D array when accessed with square brackets
[]
.
1.9.2 Accessing as reference to row, column and sub matrix
Rcpp also provides types that hold “references” to specific parts of a matrix.
- Referencing a Column:
::Column col = m( _ , 1 ); NumericMatrix
- m(_, 1): This accesses column 1 of the matrix m (zero-based indexing, so it refers to the second column).
- NumericMatrix::Column: This type is a reference to a specific column of the matrix. It doesn’t create a copy of the column but rather provides direct access to the elements in column 1.
- Referencing a Row:
::Row row = m( 1 , _ ); NumericMatrix
m(1,_)
: Thisaccesses row 1 of the matrix mNumericMatrix::Row
: This type is a reference to a specific row of the matrix.
- Referencing a Submatrix
::Sub sub = m( Range(0,1) , Range(2,3) ); NumericMatrix
Range(0,1)
: This specifies the range of rows to select (rows 0 and 1, the first two rows).Range(2,3)
: This specifies the range of columns to select (columns 2 and 3, the third and fourth columns).NumericMatrix::Sub
: This type is a reference to a submatrix of m. It references the elements in the submatrix defined by the row and column ranges.
1.9.3 Methods
Since Matrix is actually Vector
, Matrix basically has the same member functions as Vector
. Thus, member functions unique to Matrix
are only presented below.
Method | Description |
---|---|
nrow() or rows() |
Returns the number of rows. |
ncol() or cols() |
Returns the number of columns. |
row(i) |
Returns a reference Vector::Row to the i -th row. |
column(i) |
Returns a reference Vector::Column to the i -th column. |
fill_diag(x) |
Fills the diagonal elements with the scalar value x . |
offset(i, j) |
Returns the numerical index in the original vector of the matrix corresponding to the element at row i and column j . |
1.9.4 Static menmber functions
Matrix basically has the same static member function as Vector. The static member functions unique to Matrix are shown below. Matrix::diag( size, x )
: Returns a diagonal matrix whose number of rows and columns equals to “size” and the value of the diagonal element is “x”.
cppFunction("
NumericMatrix create_diag_matrix(int size, double x) {
// Create a diagonal matrix of the given size with x on the diagonal
NumericMatrix mat = NumericMatrix::diag(size, x);
return mat;
}")
create_diag_matrix(5,2)
x
is a scalar value that will be placed along the diagonal of the matrix.
1.10 Vector operations
1.10.1 Arithmetic operations
By using the +
-
*
/
operator you can perform elementwise arithmetic operations between vectors of the same length.
NumericVector x ; NumericVector y ;
Vector and vector operation
// Vector and vector operation
= x + y ;
NumericVector res = x - y ;
NumericVector res = x * y ;
NumericVector res = x / y ; NumericVector res
Vector and scalar operation
// Vector and scalar operation
= x + 2.0 ;
NumericVector res = 2.0 - x;
NumericVector res = y * 2.0 ;
NumericVector res = 2.0 / y; NumericVector res
Expression and expression operation
= x * y + y / 2.0 ;
NumericVector res = x * ( y - 2.0 ) ;
NumericVector res = x / ( y * y ) ; NumericVector res
The -
operator inverts the sign.
= -x ; NumericVector res
1.10.2 Comparison operations
Comparison of vectors using ==
!
=
<
>
>=
<=
operators produces logical vectors. You can also access vector elements using logical vectors.
Comparison of vector and vector
= x < y ;
LogicalVector res = x > y ;
LogicalVector res = x <= y ;
LogicalVector res = x >= y ;
LogicalVector res = x == y ;
LogicalVector res = x != y ; LogicalVector res
Comparison of vector and scalar
= x < 2 ;
LogicalVector res = 2 > x;
LogicalVector res = y <= 2 ;
LogicalVector res = 2 != y; LogicalVector res
Comparison of expression and expression
= ( x + y ) < ( x*x ) ;
LogicalVector res = ( x + y ) >= ( x*x ) ;
LogicalVector res = ( x + y ) == ( x*x ) ; LogicalVector res
!(...)
: The logical NOT operator !
negates the result of the comparison. In other words, it turns TRUE into FALSE and FALSE into TRUE.
= !(x < y); LogicalVector res
Accessing the elements of the vector using logical vectors.
= x[x < 2]; NumericVector res
1.11 Logical operations
1.11.1 Logical Vector
Although in C++ the boolean type (bool) has only two possible values, true (1)
and false (0)
, R’s logical vectors have a third possible value: NA
(missing or undefined). Because C++ bool can’t represent this third state, Rcpp uses integers to represent the elements of R’s LogicalVector.
In Rcpp, elements of a LogicalVector are stored as integers to accommodate the extra NA
value. Specifically, these values are represented as: + TRUE: 1 (same as C++ true) + FALSE: 0 (same as C++ false) + NA: NA_LOGICAL, which is a special constant defined as the minimum value of an integer: -2147483648 (the smallest value for a 32-bit signed integer).
1.11.2 Logical operations
Use the operator &
(logical product) |
(logical sum) !
(Logical negation) for the logical operation for each element of LogicalVector.
= {1,1,0,0};
LogicalVector v1 = {1,0,1,0};
LogicalVector v2
= v1 & v2;
LogicalVector res1 = v1 | v2;
LogicalVector res2 = !(v1 | v2);
LogicalVector res3
<< res1 << "\n"; // 1 0 0 0
Rcout << res2 << "\n"; // 1 1 1 0
Rcout << res3 << "\n"; // 0 0 0 1 Rcout
1.11.3 Function that receives LogicalVector
1.11.3.1 all()
and any()
Examples of functions that receive LogicalVector are all()
, any()
and ifelse()
.
all(v)
returnsTRUE
when all elements ofv
areTRUE
, andany(v)
returnsTRUE
if any ofv
’s elements areTRUE
.
In Rcpp, the return type of both all() and any() is not a simple bool, but a more complex type called SingleLogicalResult
.
This type can represent not only TRUE
or FALSE
, but also NA
(the third possible logical value in R). As a result, the return value of all()
or any()
cannot be directly used in a conditional statement like an if
statement in C++.
To convert the SingleLogicalResult
from all()
or any()
into a bool, Rcpp provides helper functions: + is_true(): Returns true if the result is TRUE. + is_false(): Returns true if the result is FALSE. + is_na(): Returns true if the result is NA.
1.11.3.2 ifelse()
ifelse (v, x1, x2)
receives the logical vector v
, and returns the corresponding element of x1
when the element of v
is TRUE
and the corresponding element of x2
when it is FALSE
.
x1
andx2
must either bescalars
orvectors
. If they are vectors, their length must match the length of v. This ensures that there is a corresponding element inx1
orx2
for each element inv
.
NumericVector v1;
NumericVector v2;//Number of elements of vector
= v1.length(); int n
In case, both x1 and x2 are scalar
= ifelse( v1>v2, 1, 0);
IntegerVector res1 = ifelse( v1>v2, 1.0, 0.0); NumericVector res2
Since ifelse()
does not work with a scalar character string
, we need to use a string vector whose values of elements are all the same.
= rep(CharacterVector("T"), n);
CharacterVector chr_v1 = rep(CharacterVector("F"), n);
CharacterVector chr_v2 = ifelse( v1>v2, chr_v1, chr_v2); CharacterVector res3
In case, x1
and x2
are vector
and scalar
= ifelse(v1 > v2, int_v1, 0);
IntegerVector res4 = ifelse(v1 > v2, num_v1, 0.0);
NumericVector res5 = ifelse(v1 > v2, chr_v1, Rf_mkChar("F")); // Note CharacterVector res6
- For
integer
andnumeric
vectors, thescalar
values (0 and 0.0) are recycled as needed- For
character
vectors, you cannot directly use a scalar string like"F"
; instead, you useRf_mkChar("F")
, which creates an internal SEXP (R object) representing the string “F”.
1.12 Data frame
1.12.1 Creating DataFrame
In Rcpp, DataFrame
is implemented as a kind of vector. In other words, Vector
is a vector whose element is scalar value, while DataFrame
is a vector whose elements are Vectors of the same length.
DataFrame::create()
is used to create a DataFrame object.
// Creating DataFrame df from Vector v1, v2
= DataFrame::create(v1, v2); DataFrame df
Also, use Named()
or _[]
if you want to specify column names when creating DataFrame object.
// When giving names to columns
= DataFrame::create( Named("V1") = v1 , _["V2"] = v2 ); DataFrame df
When you create a DataFrame with DataFrame::create()
, the value of the originalVector element will not be duplicated in the columns of the DataFrame, but the columns will be the “reference” to the original Vector. Therefore, changing the value of the original Vector changes the value of the columns. To avoid this, use clone()
.
<- '#include <Rcpp.h>
df using namespace Rcpp;
// [[Rcpp::export]]
DataFrame rcpp_df(){
// Creating vector v
NumericVector v = {1,2};
// Creating DataFrame df
DataFrame df = DataFrame::create( Named("V1") = v, // simple assign
Named("V2") = clone(v)); // using clone()
// Changing vector v
v = v * 2;
return df;
}'
sourceCpp(code=df)
rcpp_df()
1.12.2 Accessing DataFrame elements
When you access a specific column of a DataFrame in Rcpp, that column is temporarily assigned to a Vector
object. The Vector object allows you to manipulate or access the values of that column as if it were a separate vector.
As with DataFrame creation, assigning aDataFrame column to Vector in the above way will not copy the column value to Vector object, but it will be a “reference” to the column. Therefore, when you change the values of Vector object, the content of the column will also be changed.
1.12.2.1 Accessing by Numeric, String, or Logical Vectors
You can specify which column of the DataFrame
you want to access using different types of vectors:
Numeric Vector
(Column Number):You can specify a column by its index (0-based). Example: df[0] would return the first column.String Vector
(Column Name):You can access a column by its name using a string, which is more intuitive and readable when dealing with named columns.Logical Vector: Each element of the logical vector corresponds to whether a column should be included (
TRUE
) or excluded (FALSE
). The length of the logical vector must match the number of columns in the data frame.
1.12.3 Member functions
Method | Description |
---|---|
nrows() |
Returns the number of rows. |
ncol() |
Returns the number of columns. |
length() |
Returns the number of columns. |
size() |
Returns the number of columns. |
names() |
Returns the column names as a CharacterVector . |
offset(name) or findName(name) |
Returns the numerical index of the column with the name specified by the string name . |
fill(v) |
Fills all the columns of this DataFrame with the Vector v . |
assign(first_it, last_it) |
Assigns columns in the range specified by the iterators first_it and last_it to this DataFrame . |
push_back(v) |
Adds Vector v to the end of the DataFrame . |
push_back(v, name) |
Appends a Vector v to the end of the DataFrame and specifies the name of the added column with the string name . |
push_front(x) |
Appends a Vector v at the beginning of this DataFrame . |
push_front(x, name) |
Appends a Vector v at the beginning of this DataFrame and specifies the name of the added column with the string name . |
begin() |
Returns an iterator pointing to the first column of this DataFrame . |
end() |
Returns an iterator pointing to the end of this DataFrame . |
insert(it, v) |
Adds Vector v to this DataFrame at the position pointed by the iterator it and returns an iterator to that element. |
erase(i) |
Deletes the i -th column of this DataFrame and returns an iterator to the column just after the erased column. |
erase(it) |
Deletes the column specified by the iterator it and returns an iterator to the column just after the erased column. |
erase(first_i, last_i) |
Deletes columns from first_i to last_i - 1 and returns an iterator to the column just after the erased columns. |
erase(first_it, last_it) |
Deletes the range of columns specified by the iterators first_it to last_it - 1 and returns an iterator to the column just after the erased columns. |
containsElementNamed(name) |
Returns true if this DataFrame has a column with the name specified by the string name . |
inherits(str) |
Returns true if the attribute “class” of this object contains the string str . |
1.13 List
In Rcpp, List
is implemented as a kind of vector
. In other words, Vector is a vector whose element is scalar value, while List is a vector whose elements are any kinds of data types.
1.13.1 Creating List object
To create a List object we use the List::create()
function. Also, to specify the element name when creating List, use Named()
function or _[]
.
// Create list L from vector v1, v2
= List::create(v1, v2);
List L
// When giving names to elements
= List::create(Named("name1") = v1 , _["name2"] = v2); List L
When you create a List
with DataFrame::create()
, the value of the originalVector element will not be duplicated but the elements will be the “reference” to the original Vector. Therefore, changing the value of the original Vector changes the value of the list. To avoid this, use clone()
.
1.13.2 Accessing List elements
When accessing a specific element of List, we assign it to the other object and access it via that object.
The elements of List can be specified by numerical index, element names and logical vector.
1.13.3 Member functions
List
has the same member functions as Vector
1.14 S3``S4
class
The S3
class is actually a list whose attribute class has its own value.
We use an example to demonstrate how to use it in Rcpp
rcpp_rmse(List lm_model) {
double // Since S3 is a list, data type of the argument is specified as List.
S3 objects in R, such as objects created by
lm()
, are typically lists with an additionalclass attribute
. Since an S3 object is fundamentally a list, in Rcpp, we can use the List data type to receive it.
if (! lm_model.inherits("lm")) stop("Input must be a lm() model object.");
The
inherits("lm")
function checks whether the input object (the list) belongs to thelm
class. This ensures that the input object is an lm object.
// Extracting residuals (i.e. actual - prediction) from the S3 object
= lm_model["residuals"]; NumericVector resid
- In an
lm
object, residuals are stored in the “residuals” component.- Since the S3 object is a
list
, we can access the “residuals” component using list-like indexing (lm_model[“residuals”]), which extracts the residuals as a NumericVector in Rcpp.
// Number of elements of the residual vector
= resid.length(); R_xlen_t n
- The
length()
function is used to get the number of elements in the resid vector (i.e., the number of residuals).R_xlen_t
is a type that represents the length of vectors in R. It is typically used in Rcpp when dealing with vector lengths.
R_xlen_t
is large enough to handle very long vectors, which is crucial when you’re working with vectors whose length might exceed the range of a standard integer (which is 2^31 - 1 in R, roughly 2 billion).
// The sum of squares of the residual vector
rmse(0.0);
double for(double& r : resid){
+= r*r;
rmse
}
// Divide the residual sum of squares by the number of elements and take the square root
return(sqrt((1.0/n)*rmse));
}
1.14.1 S4
class
wait to update.
1.15 String
String
is a scalar type corresponding to the element of CharacterVector
. String can also handle NA
values (NA_STRING
) which are not supported by the C character string char*
and the C++ string std::string
.
1.15.1 Creating String object
There are three main ways to create a String object. Each approach reflects different sources for creating the String object:
- Creating a String from a
C/C++ String
(Character Pointer
orstd::string
):
// Create String from C-style string
= "Hello, world!";
String s1
// Create String from std::string
::string cpp_str = "C++ string";
std= cpp_str; String s2
- s1 is created from a C-style string (
"Hello, world!"
).- s2 is created from a C++ string (
std::string
).
- Creating a String from Another String Object:
- You can create a new String object by copying an existing String object in Rcpp.
= "Original string";
String s1
// Create a new String from another String object
= s1; String s2
- Creating a String from an Element of a
CharacterVector
:
= CharacterVector::create("one", "two", "three");
CharacterVector cv
// Create a String from the second element of the CharacterVector
= cv[1]; // Note: 0-based indexing, so this accesses "two" String s
1.15.2 Operators
In Rcpp, the String class supports the +=
operator, which allows you to append another string (or string-like object) to an existing String object.
// Creating String object
s("A");
String
// Conbining a string
+= "B";
s
<< s << "\n";
Rcout // "AB"
1.15.3 Member functions
Method | Description |
---|---|
replace_first(str, new_str) |
Replace the first substring that matches the string str with the string new_str . |
replace_last(str, new_str) |
Replace the last substring that matches the string str with the string new_str . |
replace_all(str, new_str) |
Replace all substrings that match the string str with the string new_str . |
push_back(str) |
Combine the string str to the end of this String object. (Same as += operator) |
push_front(str) |
Combine the string str at the beginning of this String object. |
set_na() |
Set NA value to this String object. |
get_cstring() |
Convert the string of this String object into a C character string constant (const char* ) and return it. |
get_encoding() |
Returns the character encoding. The encoding is represented by cettype_t . |
set_encoding(enc) |
Set the character encoding specified by cettype_t . |
1.16 Date and DateVector
1.16.1 Creating Date objects
The following codes illustrates different ways to create a Date object in Rcpp, representing a specific date.
// "1970-01-01" Date d;
Date d
; creates a Date object d that represents the epoch date “1970-01-01”, which is considered the default starting point for dates in many computing systems, including R.
d(1); // "1970-01-01" + 1 day Date
Date d(1)
; creates a Date object d that represents one day after the epoch date.
d(1.1); // "1970-01-01" + ceil(1.1) day Date
Date d(1.1)
; creates a Date objectd
that represents 1.1 days after “1970-01-01”. The number is rounded up (using ceil()
), so it is treated as 2 days after “1970-01-01”.
d("2000-01-01", "%Y-%m-%d"); // Date specified by a string with a format Date
This creates a Date object
d
from a string"2000-01-01"
with the format"%Y-%m-%d"
Date from Day
, Month
, and Year
(mm/dd/yyyy)
:
d(1, 2, 2000); // 2000-01-02 (mon, day, year) Date
1.16.2 Operators
Date
has operators +
, -
, <
, >
, >=
, <=
, ==
, !=
.
By using these operators, you can perform addition of days (+
), difference calculation of date (-
), and comparison of dates (<
, <=
, >
, >=
, ==
, !=
) . ### Member functions
Method | Description |
---|---|
format() |
Returns the date as a std::string using the same specification as base R. The default format is YYYY-MM-DD . |
getDay() |
Returns the day of the date. |
getMonth() |
Returns the month of the date. |
getYear() |
Returns the year of the date. |
getWeekday() |
Returns the day of the week as an int. (1:Sun , 2:Mon , 3:Tue , 4:Wed , 5:Thu , 6:Sat ). |
getYearday() |
Returns the number of the date through the year (with January 1st as 1 and December 31st as 365). |
is_na() |
Returns true if this object is NA. |
1.16.3 DateVector subsetting
In Rcpp, both DateVector
and DateTimeVector
are internally stored as numeric types (specifically, doubles). This design simplifies certain internal calculations but can be confusing when working with individual elements of these vectors.
This behavior is important because when you subset a DateVector using the []
operator, you extract a double
, which represents the date as the number of days since the epoch date
(1970-01-01).
To work with individual Date
or DateTime
objects from a DateVector
or DateTimeVector
, you need to explicitly cast or convert the extracted element back into a Date
or DateTime
object.
Example:
// [[Rcpp::export]]
print_year_of_date(DateVector dates) {
void for (int i = 0; i < dates.size(); ++i) {
// Convert the extracted double to a Date object
= dates[i]; // Convert double to Date
Date single_date = single_date.getYear(); // Now you can call getYear()
int year ::Rcout << "Year: " << year << std::endl;
Rcpp
} }
- dates[i] returns a double by default.
- Date single_date = dates[i];: We explicitly convert the double value to a Date object, allowing us to use the getYear() method.
1.17 RObject
The RObject
type in Rcpp is a flexible and general-purpose type that can represent any kind of R object.
Here is an example that demonstrates how RObject can be used in Rcpp to accept and handle different types of R objects.
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
handle_any_type(RObject obj) {
void // Check if the object is a NumericVector
if (obj.is<NumericVector>()) {
= as<NumericVector>(obj);
NumericVector num_vec ::Rcout << "Numeric Vector: " << num_vec << std::endl;
Rcpp
} // Check if the object is a CharacterVector
else if (obj.is<CharacterVector>()) {
= as<CharacterVector>(obj);
CharacterVector char_vec ::Rcout << "Character Vector: " << char_vec << std::endl;
Rcpp
} // Check if the object is a List
else if (obj.is<List>()) {
= as<List>(obj);
List lst ::Rcout << "List: " << lst << std::endl;
Rcpp
} // Handle unknown types
else {
::Rcout << "Unknown type!" << std::endl;
Rcpp
} }
1.17.1 Cnversion using as<>()
:
as<>()
is a template function in Rcpp that allows you to convert anRObject
to a more specific type when you know the type of the object or have determined it dynamically.
= as<NumericVector>(obj); // Convert RObject to NumericVector NumericVector num_vec
This converts the RObject into a NumericVector when you are sure that it contains a numeric vector.
1.18 Cautions in handling Rcpp objects
1.18.1 Assigning between vectors
In Rcpp, when you assign an object (like a vector, list, or matrix) v1 to another object v2 using the =
operator (e.g., v2 = v1
;), no deep copy is made. Instead, v2 becomes an alias to v1, meaning that both v1 and v2 point to the same underlying data in memory.
If you want v2 to be a completely independent copy of v1
, so that changes to v1
do not affect v2
, you need to perform a deep copy. In Rcpp, you can use the clone()
function to create a deep copy.
1.18.2 Data type of numerical index
You should use R_xlen_t
as data type for numerical index or the number of elements to support long vector in your Rcpp code.
1.18.3 Return type of operator []
In Rcpp, when you access an element of a vector with [] or (), you do not directly get the element as its native type (e.g., double, int, or String). Instead, you get a Vector::Proxy object. This Proxy object acts as an intermediary that allows you to modify the vector element directly or retrieve its value, but it is not the same as the element’s actual type.
To resolve this, you either: + Assign v[i]
to a new object of the expected type. + Convert the Proxy to the native type using as<T>()
.
1.19 R-like functions
Here is a list of Rcpp functions similar to R functions.
If you know for certain that your vector does not contain any NA values, you can optimize your code by using the noNA()
function. noNA()
marks the vector as guaranteed to be free of NA
values, which allows Rcpp to skip NA
checks and perform calculations more efficiently.
List is too long to illustrate.
1.20 Probability distribution
In Rcpp, probability distribution functions exist in two different namespaces:
Rcpp::
namespace:
Functions in this namespace return vectors.
These functions are designed to be similar to their counterparts in base R. You can pass a vector of values to these functions and they will return a vector of results.
R::
namespace:
- Functions in this namespace return scalar values (a single value).
- If you only need a single value from the distribution function, using the R:: version of the function can be more efficient because it avoids the overhead of vectorization.
<- '#include <Rcpp.h>
sort_cpp using namespace Rcpp;
// [[Rcpp::export]]
NumericVector sort_numeric_vector(NumericVector x) {
std::sort(x.begin(), x.end());
return x;
}'
sourceCpp(code=sort_cpp)