Title: | Semi-Automatic Grading of R and Rmd Scripts |
---|---|
Description: | A customisable set of tools for assessing and grading R or R-markdown scripts from students. It allows for checking correctness of code output, runtime statistics and static code analysis. The latter feature is made possible by representing R expressions using a tree structure. |
Authors: | Vik Gopal [aut, cre], Samuel Seah [aut], Viknesh Jeya Kumar [aut], Gabriel Ang [aut], Ruofan Liu [ctb], National University of Singapore [cph] |
Maintainer: | Vik Gopal <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.0.10 |
Built: | 2025-01-10 05:20:53 UTC |
Source: | https://github.com/cran/autoharp |
Converts a list that represents a tree into a binary matrix.
adj_list_2_matrix(adj_list)
adj_list_2_matrix(adj_list)
adj_list |
The adjacency list of the tree. |
Remember that the list has to be for a tree, not a general graph. Please see other help pages for more specifications.
This is a low-level function, used within the S4 class TreeHarp. It is not generally meant for use by the user.
It works by filling up the upper diagonal of the matrix before reflecting it.
A symmetric matrix of 1's and 0's, with 1 in entry (i,j) representing an edge between the two vertices.
Convert a treeharp object to an adjacency matrix.
from |
A treeharp object. |
A matrix.
The autoharp package provides functions for running and analysing R script/Rmd submissions from students.
The user manuals can be found at https://singator.github.io/autoharp-docs/
Given node names, this function retrieves the smallest tree containing at most those nodes.
carve_mst(th, node_names)
carve_mst(th, node_names)
th |
An object of class TreeHarp. |
node_names |
A character vector of node names. Nodes outside this set will not be returned in the tree. It must include the root node name. |
The function starts from each node specified and works it's way up to the root. If a branch contains nodes outside the list, it is shortened.
In the end, the tree that is returned will try to contain all the named nodes, but if that's not possible some will dropped to ensure a tree is returned, not a disconnected graph.
An object of class TreeHarp.
ex1 <- quote(x <- f(y, g(5))) th1 <- TreeHarp(ex1, TRUE) carve_mst(th1, c("<-", "x", "f", "5")) ## note: 5 is dropped. carve_mst(th1, c("<-", "x", "f", "y")) carve_mst(th1, c("<-", "f", "g"))
ex1 <- quote(x <- f(y, g(5))) th1 <- TreeHarp(ex1, TRUE) carve_mst(th1, c("<-", "x", "f", "5")) ## note: 5 is dropped. carve_mst(th1, c("<-", "x", "f", "y")) carve_mst(th1, c("<-", "f", "g"))
This functions keeps only the indicated nodes, returning a new sub-tree.
carve_subtree(obj, char_arr)
carve_subtree(obj, char_arr)
obj |
An object of class TreeHarp. |
char_arr |
A vector of 1's and 0's indicating which nodes to keep. The vector should have length equal to the number of nodes in obj. |
This returns an error if the sub-tree does not define a new tree.
An object of class TreeHarp.
th3 <- list(a= c(2L,3L,4L), b=NULL, c=c(5L, 6L), d=7L, e=NULL, f=NULL, g=NULL) carve_subtree(TreeHarp(th3), c(1,0,0,0,0,0,0)) st <- subtree_at(TreeHarp(th3), 4) plot(st)
th3 <- list(a= c(2L,3L,4L), b=NULL, c=c(5L, 6L), d=7L, e=NULL, f=NULL, g=NULL) carve_subtree(TreeHarp(th3), c(1,0,0,0,0,0,0)) st <- subtree_at(TreeHarp(th3), 4) plot(st)
This will run unit tests on the students' rmd file.
check_correctness(e_stud, e_soln, test_fname)
check_correctness(e_stud, e_soln, test_fname)
e_stud |
The environment containing the output objects from running the studnent Rmd file. |
e_soln |
The environment containing the objects from the solution template. It will probably contain objects with the suffix "_soln". These will be tested against the versions generated by the student. |
test_fname |
The R script containing the test chunks. |
Prior to calling this, populate_soln_env
should
already have been called on the solution template, and the student
file should already have been knitted in order to generate the students'
objects. Of course, one could generate the test script independent of
populate_soln_env
, but the solution environment that contains
objects with a "_soln" suffix is also needed.
The student environment, solution environment, test file and the list of tests and expectations are the inputs to this function.
A data frame with one row, and the number of columns equal to the number of tests run plus the number of scalars to keep.
Checks if a file actually is an Rmd file.
check_rmd(fname, verbose = TRUE)
check_rmd(fname, verbose = TRUE)
fname |
A character string. It is the name of the student submission file. |
verbose |
A logical value that prints messages if a non-rmd file is found. |
It runs three checks. First, it checks for the file extension to be Rmd or rmd or any such variant. Second, it checks for a YAML header at the beginning of file. Finally, it checks if there is at least one properly defined R chunk within the file.
The function will return TRUE if all the (3) checks pass, and FALSE otherwise.
This is stand-alone function. It computes the runtime stats without rendering the md/html/pdf file.
check_runtime(stud_fname, knit_root_dir, return_env = FALSE)
check_runtime(stud_fname, knit_root_dir, return_env = FALSE)
stud_fname |
The rmd filename of the student. |
knit_root_dir |
The working directory to use when knitting the file. |
return_env |
A logical value to indicate if the environment from the rmd file should be return. If FALSE, an NA value is returned. |
This routine is not used within any other function within the package. Figures are not cleaned or removed.
A list containing the running time in seconds, the memory used by the final environment in bytes (as a numeric scalar), and the environment object containing all the generated objects from the rmd file.
Cleans up the autoharp output directory.
clean_dir(dir_name, verbose = FALSE)
clean_dir(dir_name, verbose = FALSE)
dir_name |
The directory containing the files to be cleaned. |
verbose |
If TRUE, then the files and directories being removed will be printed. |
When batch rendering Rmd files, it is inevitable that some files fail. These files would have their knit.md and utf.md present, but they would not have a corresponding html file generated.
This function is called for its' side-effect, to remove those lonely md files.
If this clean-up is not done, when we try to re-run the files (perhaps with some of the errors fixed), these straggling md files will cause problems. The most crucial one is that the Rmd files will not be re-knitted, even though they have been changed.
No return value.
A wrapper function that uses assign and get.
copy_e2e(from_obj, from_env, to_obj, to_env)
copy_e2e(from_obj, from_env, to_obj, to_env)
from_obj |
The name of the object to copy. It has to be a string. |
from_env |
The environment in which the object lives. It has to be an object of class environment. |
to_obj |
The name of the object to assign it to, in the new environment. Also a string. |
to_env |
The environment to which the new object is to be assigned. It has to be an object of class environment. |
There is no return value. This function is called for its' side effect.
e1 <- new.env(); e2 <- new.env() ls(e2) evalq(x <- 1L, e1) copy_e2e("x", e1, "y", e2) ls(e2)
e1 <- new.env(); e2 <- new.env() ls(e2) evalq(x <- 1L, e1) copy_e2e("x", e1, "y", e2) ls(e2)
Count number of lints in one folder
count_lints_all(file_names, lint_list, lint_labels)
count_lints_all(file_names, lint_list, lint_labels)
file_names |
The path to the rmd files that need to be checked for lints. |
lint_list |
List of lints to check for. |
lint_labels |
List of labels to name the vector to return. |
The function will count the number of lints in a file. The lints to be checked can be passed as an argument. Else, the default will be used. The defaults are as follows:
T_and_F_symbol_linter
line_length_linter
assignment_linter
absolute_path_linter
pipe_continuation_linter
Note that labels would also need to be given if the non-default lints are chosen.
Dataframe containing the lints.
Count number of lints in one file
count_lints_one(rmd_file, lint_list, lint_labels)
count_lints_one(rmd_file, lint_list, lint_labels)
rmd_file |
The path to the rmd file to check for lints. |
lint_list |
List of lints to check for. |
lint_labels |
List of labels to name the vector to return. |
The function will count the number of lints in a file. The lints to be checked can be passed as an argument. Else, the default will be used. The defaults are as follows: * T_and_F_symbol_linter * line_length_linter * assignment_linter * absolute_path_linter * pipe_continuation_linter Note that labels would also need to be given if the non-default lints are chosen.
Vector containing the lints.
This function uses object_size from the pryr package to compute the total amount of memory used by objects in an environment.
env_size(env)
env_size(env)
env |
The environment whose size is to be computed. |
The names are wrapped in backticks. Otherwise, non-syntactic
names will cause problems. This function is used within
render_one
as part of the runtime stats assessment.
The size in bytes, as a numeric value (scalar).
e1 <- new.env() env_size(e1) evalq(x <- 1:10000L, e1) env_size(e1)
e1 <- new.env() env_size(e1) evalq(x <- 1:10000L, e1) env_size(e1)
This function converts an Examplify script (from a student,html) into an R script.
examplify_to_r(in_fname, out_fname, verbose = FALSE)
examplify_to_r(in_fname, out_fname, verbose = FALSE)
in_fname |
A file name of a student submission html file. |
out_fname |
The output R script. |
verbose |
Controls verbosity of output. |
The script has to be exported in html format, using a particular profile. The questions are stored in tags nested under id = "answers" and tag type "h2". The answers are stored in tags nested under id = "answers" and class "content".
Some of the student text may contain R code and text mixed up, so tidy_source() may not work on those, since it is parsed.
This will return NULL, but will generate an R script as output.
Extracts chunks whose labels match a pattern from the rmd file.
extract_chunks(rmd_name, pattern)
extract_chunks(rmd_name, pattern)
rmd_name |
A character string, the name of the rmd file to get the chunks from. |
pattern |
The pattern to match within the label. In fact, the match is applied to the whole chunk option. |
A list of character vectors. Each vector contains the chunk from the file. If no pattern is specified, all chunks are returned. Remember that the chunk header and tail are also included in the returned list.
Extracts non-chunks from an Rmd file.
extract_non_chunks(rmd_name, out_name)
extract_non_chunks(rmd_name, out_name)
rmd_name |
A character string, the name of the rmd file to get the chunks from. |
out_name |
An output filename, to dump the text to. |
If out_name is missing, then a character vector is returned. If outfname is specified, then nothing is returned. The text is written to the file instead.
A convenience function, for applying a function to many trees.
fapply(fharp, TFUN, combine = TRUE, combiner_fn, ...)
fapply(fharp, TFUN, combine = TRUE, combiner_fn, ...)
fharp |
The output of rmd_to_forestharp. It could also just be a list of TreeHarp objects. |
TFUN |
A function that works on a single TreeHarp and returns an output. See forestharp-helpers for examples. |
combine |
A logical value that indicates if the output from all function applications should be combined. |
combiner_fn |
A function to use to combine the individual output from each tree into a single scalar for each forest. It should handle NA values in the input vector or list. If it is missing, it defaults to sum, with na.rm=TRUE. |
... |
Additional arguments to be passed on to TFUN. |
The input is simply a list of TreeHarp objects. First, the TFUN function is lapply-ed to each TreeHarp item, resulting in either a list, or a vector with possible NA elements.
The combiner function should be aware of this sort of output, and summarise the list or vector accordingly, handling NA's and returning a scalar.
If you need to create a partial function out of a forestharp helper, use an anonymous function, as shown in the examples below.
A vector, list or a single value. If TFUN returned an error for a particular TreeHarp, that component in the list or vector would be NA. This input vector or list will then be combined by combiner_fn.
ex1 <- quote(X <- rnorm(10, mean=0.9, sd=4)) ex2 <- quote(Y <- rbeta(10, shape1=3, shape2=5)) f1 <- lapply(c(ex1, ex2), TreeHarp, quote_arg=TRUE) # returns all function calls that begin with "r", like rnorm and rbeta. # calls are returned as a list. fapply(f1, extract_fn_call, combine =FALSE, pattern="^r.*") # list is catenated. fapply(f1, extract_fn_call, combine =TRUE, pattern="^r.*", combiner_fn = function(x) {paste0(unlist(x), collapse=",")})
ex1 <- quote(X <- rnorm(10, mean=0.9, sd=4)) ex2 <- quote(Y <- rbeta(10, shape1=3, shape2=5)) f1 <- lapply(c(ex1, ex2), TreeHarp, quote_arg=TRUE) # returns all function calls that begin with "r", like rnorm and rbeta. # calls are returned as a list. fapply(f1, extract_fn_call, combine =FALSE, pattern="^r.*") # list is catenated. fapply(f1, extract_fn_call, combine =TRUE, pattern="^r.*", combiner_fn = function(x) {paste0(unlist(x), collapse=",")})
Given two nodes that are on the same path to the root, this function determines the branch that leads to the child node.
find_branch_num(th, child_id, ancestor_id)
find_branch_num(th, child_id, ancestor_id)
th |
A TreeHarp object. |
child_id |
An integer node id. It corresponds to the node to trace up from. |
ancestor_id |
An integer node id. It corresponds to the node to trace down from. |
This is used when trying to find a sub-call from a TreeHarp object. It is useful in determining the indices to use when extracting the sub-call.
An integer that denotes the branch to follow down (from the ancestor) to reach the child.
ex3 <- quote(x <- f(y = g(3, 4), z=1L)) t1 <- TreeHarp(ex3, TRUE) find_branch_num(t1, 8, 3) # should be 1 find_branch_num(t1, 5, 3) # should be 2
ex3 <- quote(x <- f(y = g(3, 4), z=1L)) t1 <- TreeHarp(ex3, TRUE) find_branch_num(t1, 8, 3) # should be 1 find_branch_num(t1, 5, 3) # should be 2
Example of functions that can be directly used on TreeHarp objects
individually, and on forestharp objects via fapply
.
count_self_fn(th) count_lam_fn(th) count_fn_call(th, pattern, pkg_name) extract_fn_call(th, pattern, pkg_name) extract_formal_args(th, fn_name) extract_assigned_objects(th) extract_actual_args(th) detect_growing(th, count = FALSE, within_for = FALSE) detect_for_in_fn_def(th, fn_name) count_fn_in_fn(th, fn_name, sub_fn) detect_fn_call_in_for(th, fn_name) extract_self_fn(th) detect_fn_arg(th, fn_name, arg) detect_nested_for(th)
count_self_fn(th) count_lam_fn(th) count_fn_call(th, pattern, pkg_name) extract_fn_call(th, pattern, pkg_name) extract_formal_args(th, fn_name) extract_assigned_objects(th) extract_actual_args(th) detect_growing(th, count = FALSE, within_for = FALSE) detect_for_in_fn_def(th, fn_name) count_fn_in_fn(th, fn_name, sub_fn) detect_fn_call_in_for(th, fn_name) extract_self_fn(th) detect_fn_arg(th, fn_name, arg) detect_nested_for(th)
th |
A TreeHarp object. |
pattern |
A regular expression to pick up function names. |
pkg_name |
The name of a package to match functions with. This should
be an exact match for the package name. The package should be attached for
this to work. In order to avoid picking up duplicate names, for instance
|
fn_name |
Function name, as a character string |
count |
For |
within_for |
If TRUE, only expresssions within a for loop are included. |
sub_fn |
(For count_fn_in_fn), the function to count (to look for within fn_name). |
arg |
The argument to check for within fn_name (as a character string). |
These are examples of functions that be called on a list of TreeHarp
objects, which we refer to as a forestharp object. Such objects are not
formally defined yet, but can be created using
rmd_to_forestharp
or using join_treeharps
.
On their own, each of these functions should return a scalar or a
1-dimensional array. When called with fapply
, the scalar
numerical values can be combined (by taking the sum, any other provided
combiner function).
The ultimate idea is that fapply should return a single feature for each rmd file that it is called upon.
count_self_fn
: Counts the number of self-defined functions.
This helper counts the number of self-defined functions. It excludes lambda functions. It returns an integer scalar.
As long as the function function
was called and assigned, it will be
counted.
count_lam_fn
: Counts the number of anonymous functions.
Counts the number of anonymous functions, typically used in sapply, etc. It
returns an integer scalar. As long as the function function
was
called but not assigned, it will be counted here.
count_fn_call
: Counts the number of function calls that match a pattern.
This helper counts the number of function calls that match a pattern. It returns a count, i.e. an integer vector of length 1.
If pkg_name
is provided instead of pattern
, then this function
counts the number of function calls from that package.
extract_fn_call
: Extracts function calls as a string.
Extracts the function calls that match a pattern. It returns a character
vector. Remember to set combine = FALSE
when calling
fapply
with it.
extract_formal_args
: Extracts function formal arguments called.
Extracts the function formal arguments from functions with a given name. The name must match the function name exactly. This returns a character vector or NULL, if no formal arguments are used.
extract_assigned_objects
: Extracts names of assigned objects
Extracts the names of assigned objects. This was written to assist in detecting missed opportunities to use the pipe operator.
extract_actual_args
: Extracts actual argument names
Extracts the actual arguments from an expression, not the formal
arguments. It only returns syntactic literals. It should be improved
to return the actual arguments for a specified function so that something
similar to extract_assigned_objects
could be returned.
detect_growing
: Detects if a vector is being grown.
It detects if there is an expression of form: x <- c(x, new_val). This is generally bad programming practice
detect_for_in_fn_def
: Detects if a for loop is present within a function
It detects if a for loop is present within a function definition.
count_fn_in_fn
: Count use of a function within another.
It counts the number of times a function is used within another.
detect_fn_call_in_for
: Detect for loop to call a function
Checks if a function has been called within a for loop.
extract_self_fn
: Extract names of functions defined by user.
Extracts names of user-defined functions. They may not all look nice, because sum functions may be anonymous functions. This function needs to be improved.
detect_fn_arg
: Was a function called with a particular argument?
Checks if a function was called with a particular argument, which could be the formal or actual one. The immediate child of the function call node is checked.
detect_nested_for
: Was a nested "for" loop called anywhere within the code?
Checks if a nested for-loop was called anywhere within the code. This returns a logical scalar for each TreeHarp object given.
# Dummy trees th1 <- TreeHarp(quote(X <- rnorm(10, mean=0.9, sd=4)), TRUE) th2 <- TreeHarp(quote(Y <- rbeta(10, shape1=3, shape2=5)), TRUE) th3 <- TreeHarp(quote(fn1 <- function(x) x + 2), TRUE) th4 <- TreeHarp(quote(df1 <- mutate(df1, new_col=2*old_col)), TRUE) # Run helpers count_self_fn(th3) count_fn_call(th4, pkg_name="dplyr") count_fn_call(th1, pattern="^r.*")
# Dummy trees th1 <- TreeHarp(quote(X <- rnorm(10, mean=0.9, sd=4)), TRUE) th2 <- TreeHarp(quote(Y <- rbeta(10, shape1=3, shape2=5)), TRUE) th3 <- TreeHarp(quote(fn1 <- function(x) x + 2), TRUE) th4 <- TreeHarp(quote(df1 <- mutate(df1, new_col=2*old_col)), TRUE) # Run helpers count_self_fn(th3) count_fn_call(th4, pkg_name="dplyr") count_fn_call(th1, pattern="^r.*")
This routines generates all subtrees rooted at the root node for a particular tree.
generate_all_subtrees(th)
generate_all_subtrees(th)
th |
An object of class TreeHarp. |
A 0-1 matrix with n rows and m columns. n is the number of sub-trees rooted at the root node of th. m is the number of nodes in this given tree. The leading column will be a 1 for all the rows.
Listing and counting subtrees of a tree, F Ruskey, SIAM Journal on Computing, 1981
th1 <- TreeHarp(list(a=c(2,3), b=NULL, c=NULL)) generate_all_subtrees(th1)
th1 <- TreeHarp(list(a=c(2,3), b=NULL, c=NULL)) generate_all_subtrees(th1)
Generate a html of thumbnails
generate_thumbnails(out_dir, html_fname, html_title, anonymise = FALSE)
generate_thumbnails(out_dir, html_fname, html_title, anonymise = FALSE)
out_dir |
The directory in which student html files and the figures are kept. |
html_fname |
The name of the master html file which will contain all
thumbnails. This file will be created in |
html_title |
The title tag of the master html page. This will be displayed on top of the output html page. |
anonymise |
If TRUE, the original filenames will be replaced with inocuous numbers. If FALSE, the original filenames will be retained. |
After running render_one
on a set of R/Rmd files
in a directory, this function helps to consolidate them for review.
The output folder contains all the generated html files, images and a log file. This function will extract the images from each html file and display them as thumbnails on a new html page, with links to all individual files.
The function returns nothing, but it should create a html page of thumbnails of all the images that students plotted, along with links to their individual pages.
The generic method definition for getting adjacency list from a TreeHarp object.
get_adj_list(x, ...) ## S4 method for signature 'TreeHarp' get_adj_list(x, ...)
get_adj_list(x, ...) ## S4 method for signature 'TreeHarp' get_adj_list(x, ...)
x |
An object of class TreeHarp. |
... |
Unused arguments, for now. |
The adjacency list for a TreeHarp object.
TreeHarp
: A getter.
Allows user to extract the adjacency list of a treeharp object.
The generic method definition for getting child node ids.
get_child_ids(x, node_num) ## S4 method for signature 'TreeHarp' get_child_ids(x, node_num) ## S4 method for signature 'list' get_child_ids(x, node_num)
get_child_ids(x, node_num) ## S4 method for signature 'TreeHarp' get_child_ids(x, node_num) ## S4 method for signature 'list' get_child_ids(x, node_num)
x |
An object of class TreeHarp. |
node_num |
An integer, length 1. This the node whose children we are after. If the specified node is a leaf, the NULL is returned. |
An integer vector, indicating the children node ids.
TreeHarp
: Obtain child nodes.
Allows user to extract the child nodes from a specified node from TreeHarp object.
list
: Obtain child nodes.
Allows user to extract the child nodes from a specified node from an adjacency list.
This function retrieves the child node ids of a given node from an adjacency list of a tree.
get_child_ids2(adj_list, at_node)
get_child_ids2(adj_list, at_node)
adj_list |
The adjacency list of the tree. |
at_node |
The node whose children should be extracted. |
Remember that the list has to be for a tree, not a general graph. Please see other help pages for more specifications.
This is a low-level function, used within the S4 class TreeHarp. It is not generally meant for use by the user.
A vector of integers specifying the children of that particular node. If the node is a leaf, it returns NULL.
This function obtains the node levels from a tree.
get_levels(adj_list)
get_levels(adj_list)
adj_list |
The adjacency list of the tree. |
This function is used to check if the specification of the tree is in BFS order. If that is indeed the case, the levels of each node should be sorted.
This function is not exported for the general user.
It returns a vector of integers. The length of this vector will be the number of nodes in the tree. The root is at level 1, the next is at level 2, and so on.
The input filename could correspond to an R script or an Rmd file.
get_libraries(fname)
get_libraries(fname)
fname |
The Rmd filename or R script. |
The file is assumed to be either an R script or an Rmd file. If it is found to be an Rmd file using extract_chunks, it is purl-ed before libraries are extracted. If it is found to be NOT an Rmd, it is assumed to be an R script and nothing is done to process it.
The file is not parsed, so even text files will work with this function.
A character vector containing the packages used within the Rmd document.
From the parent's depth and the last labelled node, we obtain the node id and depth of a child.
get_next_depth_id(parent_node_id, env_ni)
get_next_depth_id(parent_node_id, env_ni)
parent_node_id |
The id of the parent node we are considering. |
env_ni |
An environment object, possibly containing a data frame with columns id, name, call_status, arg_type and depth. |
This is for internal use. It may be removed from user-view soon!
A list containing the id and depth of the next node.
This generates the next sub-tree in the enumeration list.
get_next_subtree(obj, char_arr)
get_next_subtree(obj, char_arr)
obj |
An object of class TreeHarp. |
char_arr |
A vector of 1's and 0's indicating which nodes to keep. The vector should have length equal to the number of nodes in obj. |
Need to reference the paper. This generates the next sub-tree, rooted at the root node of this tree. It will generate singletons on it's own. It has to be used within a loop to do that.
A vector of 1's and 0's, which denotes the next sub-tree in the list.
th1 <- TreeHarp(list(a=c(2,3), b=NULL, c=NULL)) get_next_subtree(th1, c(1,0,0)) get_next_subtree(th1, c(1,1,0))
th1 <- TreeHarp(list(a=c(2,3), b=NULL, c=NULL)) get_next_subtree(th1, c(1,0,0)) get_next_subtree(th1, c(1,1,0))
The generic method definition for getting node types from a TreeHarp object.
get_node_types(x, ...) ## S4 method for signature 'TreeHarp' get_node_types(x, ...)
get_node_types(x, ...) ## S4 method for signature 'TreeHarp' get_node_types(x, ...)
x |
An object of class TreeHarp. |
... |
Unused arguments, for now. |
A data frame containing the node types for a TreeHarp object. If the slot is empty, NA is returned.
TreeHarp
: A getter.
Allows user to extract the node types of a treeharp object.
Get the node id of the parent call for a given node.
get_parent_call_id(x, node_id)
get_parent_call_id(x, node_id)
x |
A TreeHarp object. |
node_id |
The id of the node whose parent call is to be found. An integer value. |
When we need to go up the parse tree to obtain the function that
called this node, we use this function. It is similar to get_parent_id
,
except that that function only returns the immediate parent.
It is not useful to call this function when the TreeHarp object is not constructed from a language object.
Perhaps this function is necessary only because of the way language objects are represented by the autoharp: formal arguments are included in the tree representation. When we wish to find the calling function, we have to walk up the branches till we reach a function call.
An integer corresponding to the node id of the calling function.
ex3 <- quote(x <- f(y = g(3, 4), z=1L)) t1 <- TreeHarp(ex3, TRUE) # get the function that calls g: get_parent_call_id(t1, 6) #contrast with this: get_parent_id(t1, 6)
ex3 <- quote(x <- f(y = g(3, 4), z=1L)) t1 <- TreeHarp(ex3, TRUE) # get the function that calls g: get_parent_call_id(t1, 6) #contrast with this: get_parent_id(t1, 6)
The generic method definition for getting parent node id.
get_parent_id(x, node_num) ## S4 method for signature 'TreeHarp' get_parent_id(x, node_num) ## S4 method for signature 'list' get_parent_id(x, node_num)
get_parent_id(x, node_num) ## S4 method for signature 'TreeHarp' get_parent_id(x, node_num) ## S4 method for signature 'list' get_parent_id(x, node_num)
x |
An object of class TreeHarp or an adjacency list. |
node_num |
An integer, length 1. This the node whose parent we are after. If node_num is equal to 1, then NULL is returned because that should be the root node. |
An integer, indicating the parent node.
TreeHarp
: Obtain parent node id.
Extracts parent id of a node from a TreeHarp object.
list
: Obtain parent node id.
Extracts parent id of a node from an adjacency list object.
This function retrieves the parent node id of a given node from an adjacency list of a tree.
get_parent_id2(adj_list, at_node)
get_parent_id2(adj_list, at_node)
adj_list |
The adjacency list of the tree. |
at_node |
The node whose parent should be extracted. |
Remember that the list has to be for a tree, not a general graph. Please see other help pages for more specifications.
This is a low-level function, used within the S4 class TreeHarp. It is not generally meant for use by the user.
If there are nodes that have more than one parent, then a warning is issued.
A integer of length 1 should be returned for all nodes except the root. For the latter, the function will return NULL.
Obtains an index that can be used to extract a sub-call from a language object.
get_recursive_index(th, node_id)
get_recursive_index(th, node_id)
th |
A TreeHarp object. |
node_id |
An integer corresponding to a call within the parse tree (not a literal, symbol or a formal argument). |
A vector of indices, that can be used (together with "[[") to obtain a sub-call
ex3 <- quote(x <- f(y = g(3, 4), z=1L)) t1 <- TreeHarp(ex3, TRUE) rec_index <- get_recursive_index(t1, 6) ex3[[rec_index + 1]] ex3[[get_recursive_index(t1, 3)+1]]
ex3 <- quote(x <- f(y = g(3, 4), z=1L)) t1 <- TreeHarp(ex3, TRUE) rec_index <- get_recursive_index(t1, 6) ex3[[rec_index + 1]] ex3[[get_recursive_index(t1, 3)+1]]
This function will look for the explanation for the checks being done. If there is an explanation, the function will return the summary in HTML format. If not it will return 'not found' in HTML format.
get_summary_output( rmd_file, summary_header = "# Summary Output", dir = tempdir() )
get_summary_output( rmd_file, summary_header = "# Summary Output", dir = tempdir() )
rmd_file |
The path to the rmd file to search for the summary. |
summary_header |
The header to look for. |
dir |
A temporary directory to store the temporary Rmarkdown file before extracting the html content. The temp file will be deleted before the function exits. |
The function is used as a helper function. Returns the HTML formatted string.
A tree is a graph that is connected but does not have any cycles. This function checks if a provided adjacency list is connected.
is_connected(adj_list, root = 1)
is_connected(adj_list, root = 1)
adj_list |
The adjacency list of the tree. |
root |
The root node to start checking from. This defaults to the first node in the adjacency list. |
This function is used as one of the validity checks within the definition of the TreeHarp class. It is a low-level function, not really meant for the general user of the package. Hence it is not exported.
The nodes are traversed in a BFS order. The function could actually be combined with is_cyclic_r, but it is kept separate for modularity reasons.
An alternative was to convert the list to an adjacency matrix and check for a row and column of zeros.
The function returns a TRUE if the graph is connected and FALSE otherwise.
A tree is a graph that is connected but does not have any cycles. This function checks if a provided adjacency matrix contains cycles.
is_cyclic_r(adj_mat, node_v, parent_node = -1, visited_env)
is_cyclic_r(adj_mat, node_v, parent_node = -1, visited_env)
adj_mat |
A symmetric matrix of 1's and 0's, with 1 in entry (i,j) representing an edge between the two vertices. |
node_v |
The node to begin searching for cycles from. An integer. |
parent_node |
The parent node of node_v. Also an integer. Use -1 if you are starting from node 1. This is in fact the default. |
visited_env |
An environment containing a logical vector indicating which nodes have already been visited. The vector has to be named "visited". See the details. The function works by traversing all the nodes, in a BFS order. If it finds a node has a parent that has already been visited, it concludes that there is a cycle. The function is recursive, and has to update the vector of visisted nodes within each call. Hence the visited vector is stored in an environment that is passed along. It will return an error if no such environment is provided. It is a very specific input that the function requires, and this is another reason that this function is not exported. This function is used within the validity checks for the S4 class. It is not exported for the user. |
A logical value indicating if the graph contains cycles.
This function checks if a given tree is a sub-tree of another tree at a particular node.
is_subtree_rooted_at(x, y, at_node)
is_subtree_rooted_at(x, y, at_node)
x |
An object of class TreeHarp. |
y |
An object of class TreeHarp. |
at_node |
An integer, corresponding to a node in object y. The sub-tree of y, rooted at at_node, is compared to x. |
Here's how it works: The sub-tree of y, rooted at at_node is first extracted. The tree x is then compared to this. If x is a sub-tree of it, then this function returns FALSE. Otherwise it returns TRUE.
A logical value indicating if x is a sub-tree of y, rooted at at_node.
thb1 <- TreeHarp(list(b=2, d=NULL)) tha1 <- TreeHarp(list(a=c(2,3), b=4, c = NULL, d=NULL)) is_subtree_rooted_at(thb1, tha1, 1) # FALSE is_subtree_rooted_at(thb1, tha1, 2) # TRUE
thb1 <- TreeHarp(list(b=2, d=NULL)) tha1 <- TreeHarp(list(a=c(2,3), b=4, c = NULL, d=NULL)) is_subtree_rooted_at(thb1, tha1, 1) # FALSE is_subtree_rooted_at(thb1, tha1, 2) # TRUE
Computes the Jaccard index between two trees.
jaccard_treeharp(th1, th2, weighted = FALSE)
jaccard_treeharp(th1, th2, weighted = FALSE)
th1 |
A TreeHarp object. |
th2 |
A TreeHarp object. |
weighted |
A logical value, indicating if the weighted Jaccard similarity should be computed. |
The unweighted form is just the cardinality of the intersection of the two sets of tokens, divided by the union of the two sets.
The weighted form is described on the WIkipedia page: https://en.wikipedia.org/wiki/Jaccard_index#Weighted_Jaccard_similarity_and_distance
A real number between 0 and 1.
Given a list of trees, this will root them.
join_treeharps(...)
join_treeharps(...)
... |
A list of Treeharp objects. |
This function combines TreeHarp objects into a single TreeHarp. The function will root all of them at a node called "script", which is neither a function call nor an argument nor a symbol. The BFS ordering is then updated.
Objects that are not of class TreeHarp will be dropped from the list before the rooting takes place.
A TreeHarp object
Compute tree similarity
K2(t1, t2, verbose = FALSE)
K2(t1, t2, verbose = FALSE)
t1 |
A TreeHarp object. |
t2 |
A TreeHarp object. |
verbose |
A logical value, indicating if the output should be verbose. |
As far as possible, this function tries to do things recursively.
It sets up a n x m matrix and fills up as much as it can. Then it uses
recursive relationships to fill in the rest. When it cannot, it uses
generate_all_subtrees
to generate and count common subtrees.
An integer, that counts the number of sub-trees in common between the two trees. Please see the reference papers for more information.
Convolution kernels for natural language, M Collins and N Duffy, Advances in neural information processing systems, 2002.
Convolution kernels on discrete structures, D Haussler, Technical report, Department of Computer Science, UC Santa Cruz, 1999.
tree1 <- TreeHarp(quote(x <- 1), TRUE) tree2 <- TreeHarp(quote(y <- 1), TRUE) K2(tree1, tree2, TRUE)
tree1 <- TreeHarp(quote(x <- 1), TRUE) tree2 <- TreeHarp(quote(y <- 1), TRUE) K2(tree1, tree2, TRUE)
Retains only specific branches, that are identified by their node numbers.
keep_branches(th, branch_nodes, include_lower = TRUE)
keep_branches(th, branch_nodes, include_lower = TRUE)
th |
A TreeHarp object. |
branch_nodes |
An integer vector, specifying the nodes to keep. |
include_lower |
A logical value - whether or not the lower branches should also be kept. |
A TreeHarp object.
ex1 <- quote(x <- f(y, g(5))) th1 <- TreeHarp(ex1, TRUE) keep_branches(th1, 3) keep_branches(th1, 3, include_lower = FALSE) keep_branches(th1, c(2,3), FALSE) keep_branches(th1, c(3, 4), FALSE)
ex1 <- quote(x <- f(y, g(5))) th1 <- TreeHarp(ex1, TRUE) keep_branches(th1, 3) keep_branches(th1, 3, include_lower = FALSE) keep_branches(th1, c(2,3), FALSE) keep_branches(th1, c(3, 4), FALSE)
A recursive function for converting a language object to treeharp.
lang_2_tree(lang_obj, node_id, ni_env)
lang_2_tree(lang_obj, node_id, ni_env)
lang_obj |
A language object. |
node_id |
The calling node to this language object. This should only
be greater than 0 if the |
ni_env |
An environment to store the adjacency list and node information. |
This function is used by TreeHarp constructors. It should not have to be called by a user. It works by bulding up an adjacency list and node node information data frame within the supplied environment.
Nothing
e1 <- new.env() lang_2_tree(quote(X <- 1), 0, e1) e1$adj_list e1$node_info
e1 <- new.env() lang_2_tree(quote(X <- 1), 0, e1) e1$adj_list e1$node_info
Generate a dataframe from the log file.
log_summary(log_file)
log_summary(log_file)
log_file |
The name of the log file generated from
|
This provides a table view of the log file, which is updated in a more natural format by simply concatenating new updates. The output of this function makes it easier to group entries by filename, time, or status, or even error message.
The output table does not contain correctness output. It only contains the columns name, timestamp, status (SUCCESS/FAIL), error message, number of libraries used and number of libraries installed.
The function returns a dataframe summarising the details in the log file.
A utility function for resolving duplicate filenames on LumiNUS. (Only useful for NUS instructors!)
lum_local_match(audit_file_path, local_files_dir, skip_name)
lum_local_match(audit_file_path, local_files_dir, skip_name)
audit_file_path |
The audit file downloaded from LumiNUS. This must be an Excel file. It comes from the "Download Activity" button for the corresponding folder. It should contain columns such as "Action Time", "Action", etc. In the folder settings, students should be identified by their NAME. |
local_files_dir |
The directory containing the files downloaded from LumiNUS. It is usually downloaded as a zip-file and then extracted. |
skip_name |
The username to skip. Usually this is the instructor's name. This must be present. |
Here is how LumiNUS works to resolve duplicate filenames: It will append the students' filenames to the end of the file (in parenthesis), but it will only use the first 15 characters of the students' name. In LumiNUS, filenames are not case-sensitive - test.Rmd and test.rmd are considered duplicate filenames.
Here is how the function works: From the audit trail, it retrieves the name of the most recent upload for each student. After converting these to lowercase, duplicate file names have their student names appended. These new names are matched to the filenames that were downloaded.
Remember to clean up the filenames after this, because knitr does not like parentheses in file names!
It returns a tibble, containing the remote and local filenames, matched to the userid of students. The columns in this tibble are
mod_time: file modification time, from the downloaded file.
luminus_time: time that the file was uploaded to LumiNUS; retrieved from audit trail.
local_fname: The downloaded local file name.
luminus_fname: The filename that we see on LumiNUS.
Converts a binary matrix that represents a tree into an adjacency list.
matrix_2_adj_list(mat)
matrix_2_adj_list(mat)
mat |
A symmetric matrix of 1's and 0's, with 1 in entry (i,j) representing an edge between the two vertices. |
Remember that the list that is finally output is for a tree, not a general graph. Please see other help pages for more specifications.
The input matrix should be BFS ordered. The adjacency list only notes the child node(s) of a particular node. If a matrix denotes multiple parents, it will not be picked up.
This is a low-level function, used within the S4 class TreeHarp. It is not generally meant for use by the user.
The adjacency list of the tree.
Identifies the nodes on the path from a node up to the root of a TreeHarp object.
path_to_root(th, node_num)
path_to_root(th, node_num)
th |
A TreeHarp object. |
node_num |
A node number to start tracking upwards from. |
This function allows the user to identify the branch from a node up to the root of a tree.
A vector of 1's and 0's that can be used to carve out the branch
alone, using carve_subtree
.
ex1 <- quote(x <- f(y, g(5))) th1 <- TreeHarp(ex1, TRUE) path_to_root(th1, 5)
ex1 <- quote(x <- f(y, g(5))) th1 <- TreeHarp(ex1, TRUE) path_to_root(th1, 5)
A plot method for visualising treeharp objects.
## S4 method for signature 'TreeHarp' plot(x, y, ...)
## S4 method for signature 'TreeHarp' plot(x, y, ...)
x |
An object of class TreeHarp. |
y |
Unused. |
... |
Additional arguments passed on to plot.igraph(). |
The treeharp object is converted to an igraph object before it is plotted.
Returns NULL, invisibly.
Generates objects for checking solution correctness.
populate_soln_env( soln_fname, pattern, knit_root_dir, render_only = FALSE, output = NULL )
populate_soln_env( soln_fname, pattern, knit_root_dir, render_only = FALSE, output = NULL )
soln_fname |
An rmd file containing the checks to be run on the student solution. |
pattern |
The pattern that identifies which chunks in the solution are are testing chunks. If this argument is missing, the default pattern used is "test". |
knit_root_dir |
The root directory to use for knitting the rmd file. This argument is optional. If it is missing, it uses the root directory in knitr::opts_knit$get('root.dir'). |
render_only |
A logical value. If this is TRUE, then the solution is run and rendered. In this case, a list of length two is returned. If this is FALSE (default), then then a list of length three is returned. See the Return section for more details. |
output |
The path to the knitted solution md file. This is usually
deleted immediately, but sometimes we may want to keep it. This
argument is passed on to |
Test code should be written in a chunk that generates scalars from student objects.
The solution file has to be an Rmd file (not an R script), because it relies on the autoharp.obj and autoharp.scalars knitr hooks being present.
In addition, if it is required that a solution object is to be tested against the analogous object within the student environment, these objects should be listed within the autoharp option of a code chunk. These objects will be copied with the "." preffix.
Here is an overview of how the function works:
Knit the solution file to generate the solution (or "correct") objects.
Rename these with the "." prefix in the solution environment object.
Extract the lines of test code into a temporary R script.
Wrap those chunks that contain autoharp.scalars hook with tryCatch.
Add a few lines at the bottom of the script to indicate which scalars should be kept.
Return the solution environment and path to the R test script.
Typically, the next step is to call check_correctness
.
If render_only is FALSE, a list containing 2 components: the environment populated by the solution rmd and the path to an R script containing the test code.
If render_only is TRUE, then the output list contains the aforementioned environment, and the path to the rendered solution file (html). This option is useful for debugging the solution file.
Prunes a tree up to a depth specified by a set of node names.
prune_depth(th, names_to_keep)
prune_depth(th, names_to_keep)
th |
A TreeHarp object. |
names_to_keep |
The node names to keep in the pruned tree. |
This is a seldom used function. It works in this way. Given a set of node names, it identifies the node with the greatest depth in that set. The function then returns the sub-tree, that contains all the nodes with a depth smaller than or equal to that depth. If the node types slot is not NA, then that data frame is filtered and returned too.
Take a look at the examples for a clearer picture.
An object of class TreeHarp.
carve_subtree
, path_to_root
,
carve_mst
ex1 <- quote(x <- f(y, g(5))) th1 <- TreeHarp(ex1, TRUE) s1 <- prune_depth(th1, c("f", "y")) s2 <- prune_depth(th1, c("f", "z")) # node not present! plot(s1) plot(s2)
ex1 <- quote(x <- f(y, g(5))) th1 <- TreeHarp(ex1, TRUE) s1 <- prune_depth(th1, c("f", "y")) s2 <- prune_depth(th1, c("f", "z")) # node not present! plot(s1) plot(s2)
Updates the node information regarding an R expression.
rbind_to_nodes_info(id, name, call_status, formal_arg, depth, env_ni)
rbind_to_nodes_info(id, name, call_status, formal_arg, depth, env_ni)
id |
The id of the node to be added. This should be an integer of length 1. |
name |
The name of the node. |
call_status |
Is the language object a call or a symbol/literal? This should a logical value. |
formal_arg |
Is the language object a formal argument or not? This should be a logical value. |
depth |
An integer indicating the depth of this language object in the parse tree. |
env_ni |
An environment object, possibly containing a data frame with columns id, name, call_status, formal_arg and depth. |
This is for internal use. It may be removed from user-view soon!
TRUE is returned invisibly.
This function hard codes some of the common extensions that we deal with.
remove_extension(fname)
remove_extension(fname)
fname |
A character string of the filename, with the extension present. |
If none of the known extensions knit.md, utf8.md, R or Rmd are found, then the last period onwards are removed. See the examples.
If no extensions are found, the original filename is returned.
A character string, with the extension removed.
remove_extension("test.Rmd") remove_extension("test.knit.md") remove_extension("test.r.txt") remove_extension("test_no_extension")
remove_extension("test.Rmd") remove_extension("test.knit.md") remove_extension("test.r.txt") remove_extension("test_no_extension")
Renders the specified file, and collates run time, static and correctness checks.
render_one( rmd_name, out_dir, knit_root_dir, log_name, soln_stuff, max_time_per_run = 120, permission_to_install = FALSE )
render_one( rmd_name, out_dir, knit_root_dir, log_name, soln_stuff, max_time_per_run = 120, permission_to_install = FALSE )
rmd_name |
The path to the file to be rendered and checked. |
out_dir |
The directory to store all the html output, md output, and figures. |
knit_root_dir |
The working directory while knitting the file. |
log_name |
A character string, denoting the log file name. It defaults to "render_one.log". If this file is already present in the directory, this function will append to it. |
soln_stuff |
This is a list, with components env, test_fname, and
tt_list. This object is the output of |
max_time_per_run |
The maximum time to wait before aborting the rendering of a particular file. |
permission_to_install |
If TRUE, then the function will try to install any packages needed. By default, this is FALSE. |
The log file contains a record of the libraries used by the student, and if any new libraries needed to be installed. The status will be one of SUCCESS, FAIL or UNKNOWN.
A data frame with one row for each file in the input directory.
populate_soln_env
, check_correctness
Replaces special characters in the name of an R or Rmd script.
replace_sp_chars_filename(dir_name, return_df = TRUE)
replace_sp_chars_filename(dir_name, return_df = TRUE)
dir_name |
A character string, referring to the directory of Rmd files whose names should be replaced. |
return_df |
A logical value, indicating if the old and new names should be returned (in a tibble). |
If a filename contains one of the following special characters
(ignore the quotes here): "[ <>()|\:&;#?*']
", the
knit function will replace
them with underscores. Hence the filenames in the autoharp input directory
and the output directory will not match, even allowing for the change in
file extension. This will cause problems when we try to run
render_one
again on the same input directory.
This function renames the files in the input directory by replacing all special characters there.
The NUS LMS (LumiNUS) introduces parenthesized names or numbers in order to make filenames unique, so this function is necessary for NUS instructors.
A tibble containing the old and new names.
This function is used to detach packages that have been added by a student script.
reset_path(old_path)
reset_path(old_path)
old_path |
A character vector of package namespaces. This is usually
the output of |
When a student script is rendered using render_one
,
new packages might be added to the search path. These may conflict with the
instructors' search path order, or with subsequent runs of
render_one
on students. Hence there is a need to reset the
search path before this is done.
This function does not unload namespaces. It only detaches them from the search path. For a difference between the two, please see Hadley's page.
There is no object returned. This function is called for it's side- effect of altering the search path.
opath <- search() # Load a package reset_path(opath)
opath <- search() # Load a package reset_path(opath)
Reads in an Rmd file or an R script and converts it to a list of TreeHarp objects.
rmd_to_forestharp(fname, line_nums = FALSE)
rmd_to_forestharp(fname, line_nums = FALSE)
fname |
The filename that is to be read in. |
line_nums |
A logical value, indicating if the line numbers of expressions should be returned along with the expressions. By default, this value is FALSE. |
The TreeHarp constructor is wrapped in a tryCatch loop, so that it does not fail if an expression could not be converted to a TreeHarp object.
The object returned is not a specially defined class. It is either a list of
length 2, or a list of TreeHarp objects. This output is meant to be used
with fapply
.
If the input file is an Rmd file (checked with extract_chunks), then the chunks are extracted and converted to TreeHarp objects. If the input file is not an Rmd, it is assumed to be an R script. This script is then supplied to parse. In either case, a parsing error here could cause the function to fail.
Line numbers are extracted using get_source_expressions
from
the lintr package.
A list of TreeHarp objects, or a list with 2 components containing the TreeHarp objects and a vector of line numbers.
fapply
, extract_chunks
,
extract_chunks
, get_source_expressions
Count the individual tokens. Part of the NLP analysis process.
rmd_to_token_count(fname, include_actuals = TRUE)
rmd_to_token_count(fname, include_actuals = TRUE)
fname |
The Rmd or R file name. |
include_actuals |
Whether actual arguments/literals should be included. If this is FALSE, then only calls and formal arguments will be used in the count. |
A tibble. The tibble will contain a the frequency count for all tokens present in the student script.
This function runs the shiny app that students submit to in order to obtain feedback on their Rmd submission file.
run_tuner( app_title, soln_templates_dir, knit_wd, tabs = c("lint", "html", "correctness"), lint_list, corr_cols_to_drop = c(1, 2, 4, 5), max_time = 120, summary_header = "# Summary Output", permission_to_install = FALSE, ... )
run_tuner( app_title, soln_templates_dir, knit_wd, tabs = c("lint", "html", "correctness"), lint_list, corr_cols_to_drop = c(1, 2, 4, 5), max_time = 120, summary_header = "# Summary Output", permission_to_install = FALSE, ... )
app_title |
A character string of the title of the app. |
soln_templates_dir |
This should be the directory containing all solution templates. Solution templates are Rmd files. |
knit_wd |
The working directory for knitting (to HTML). |
tabs |
A character vector of type of check to be done |
lint_list |
A list of lints (from lintr package) to be run on the uploaded script. If missing, a default list of lints is run. See the details section. |
corr_cols_to_drop |
This should be an integer vector of columns to drop from the correctness check. By default, the columns corresponding to filename, timestamp, run-time timing and memory are dropped. |
max_time |
The maximum time (in seconds) allocated to rendering before
failing. This is passed on to |
summary_header |
This the header to search for when generating the description for the correctness check. |
permission_to_install |
This is the argument to toggle for auto installation of libraries. Default is set to FALSE. |
... |
Extra arguments passed on to runApp from shiny. Useful for specifying port, etc. |
If the lint_list
argument is missing, the following list of
lints is run:
T_and_F_symbol_linter,
assignment_linter,
closed_curly_linter,
commas_linter,
equals_na_linter,
function_left_parentheses_linter,
infix_spaces_linter,
line_length_linter,
no_tab_linter,
open_curly_linter,
paren_brace_linter,
absolute_path_linter,
pipe_continuation_linter,
spaces_inside_linter,
trailing_blank_lines_linter,
trailing_whitespace_linter,
unneeded_concatenation_linter
The full list of available lints can be found here: linters
.
This function is run for its side-effect.
Extracts a sub-tree rooted at a particular node.
subtree_at(obj, at_node, preserve_call = FALSE)
subtree_at(obj, at_node, preserve_call = FALSE)
obj |
An object of class TreeHarp |
at_node |
The root of the new sub-tree. An integer, not a label, that corresponds to BFS indexing of the tree. |
preserve_call |
A logical value that indicates if a sub-call should be extracted. This might be slower, but it allows you to evaluate it later. |
This is meant for internal use, so the nodeTypes slot is silently dropped, unless preserve_call is set to TRUE
An object of class TreeHarp.
th3 <- list(a= c(2L,3L,4L), b=NULL, c=c(5L, 6L), d=7L, e=NULL, f=NULL, g=NULL) subtree_at(TreeHarp(th3), 3) st <- subtree_at(TreeHarp(th3), 4) plot(st)
th3 <- list(a= c(2L,3L,4L), b=NULL, c=c(5L, 6L), d=7L, e=NULL, f=NULL, g=NULL) subtree_at(TreeHarp(th3), 3) st <- subtree_at(TreeHarp(th3), 4) plot(st)
Function to rearrage nodes in BFS
to_BFS(adj_list, node_info)
to_BFS(adj_list, node_info)
adj_list |
The output of lang_2_tree. |
node_info |
The output of lang_2_tree. |
This function is for an internal TreeHarp constructor use. It is not exported.
An adjacency list and nodes info data frame in BFS order.
Computes similarity between two trees (non-recursively)
tree_sim(t1, t2, norm = FALSE, ...)
tree_sim(t1, t2, norm = FALSE, ...)
t1 |
A TreeHarp object |
t2 |
Anothe TreeHarp object. |
norm |
A logical value, indicating if the kernel function should be normalised, to account for different tree lengths. |
... |
Unused arguments, reserved for mcmapply |
A numerical value between 0 and 1 (if normed).
This class is used to represent a single R expression as a tree.
TreeHarp(lang_obj, quote_arg, ...) TreeHarp(lang_obj, quote_arg, ...) ## S4 method for signature 'logical' TreeHarp(lang_obj, quote_arg, ...) ## S4 method for signature 'missing' TreeHarp(lang_obj, quote_arg, ...) ## S4 method for signature 'TreeHarp' length(x) ## S4 method for signature 'TreeHarp' show(object) ## S4 method for signature 'TreeHarp' names(x)
TreeHarp(lang_obj, quote_arg, ...) TreeHarp(lang_obj, quote_arg, ...) ## S4 method for signature 'logical' TreeHarp(lang_obj, quote_arg, ...) ## S4 method for signature 'missing' TreeHarp(lang_obj, quote_arg, ...) ## S4 method for signature 'TreeHarp' length(x) ## S4 method for signature 'TreeHarp' show(object) ## S4 method for signature 'TreeHarp' names(x)
lang_obj |
This should be an adjacency list for a tree (not a graph), or the adjacency matrix of a tree, or the expression to be parsed. If it is a list, only child nodes should be indicated (see the examples). |
quote_arg |
If this argument is missing or FALSE, then the class of
If this argument is TRUE, the |
... |
Unused at the moment. |
x |
A Treeharp object. |
object |
A TreeHarp object. |
The following validity checks are conducted on the object:
Is the graph connected? If no, the object is invalid.
Are there cycles? If yes, the object is invalid.
Are the nodes labelled in a BFS ordering? If not, the object is not valid.
Constructors return an object of class TreeHarp.
length
: An integer of length 1.
print
: Returns NULL. It prints a string representation of a
TreeHarp object.
names
: A character vector with length equal to the number of
nodes.
TreeHarp
: A constructor for TreeHarp.
Converts either adjacency list or matrix into a TreeHarp object.
TreeHarp
: A constructor for TreeHarp.
Converts language object into a TreeHarp object.
length
: To get the length of a tree.
The length of the tree refers to the number of nodes in the tree.
show
: To print a tree representation.
A string representation of a TreeHarp object.
names
: To get tree labels
This function returns the node labels of the tree.
adjList
The adjacency list of the tree. The list must be named. The nodes should be labelled in Breadth-First Order. The first component must be the root of the tree. Leaves of the tree should be NULL elements.
nodeTypes
A data frame describing the type of node. The columns in the data frame will be derived from the expression used to instantiate the object. The column names will be id (node id), name, call_status, formal_arg and depth. This slot can be left missing (i.e., populated with NA). This latter feature is useful when we just wish to test something out.
This slot is only populated automatically when an R expression is provided
as lang_obj
and quote_arg
is TRUE.
repr
A string representation of the tree. This will be printed when the show method of TreeHarp is called.
call
The language object that was used to construct the tree (if it was). If the object was constructed from a list/matrix, this will be NA.
l1 <- list(a=c(2,3), b=NULL, c=NULL) # directly using new() treeharp1 <- new("TreeHarp", adjList = l1, nodeTypes = NA) # using one of the constructor methods (for lists) treeharp2 <- TreeHarp(l1) # using the constructor for matrices. m1 <- matrix(0L, 3, 3) dimnames(m1) <- list(letters[1:3], letters[1:3]) m1[1, ] <- c(0, 1L, 1L) m1[, 1] <- c(0, 1L, 1L) treeharp3 <- TreeHarp(m1) # Supplying a language object to get the same tree (with nodeTypes # populated) ex1 <- quote(a(b,c)) TreeHarp(ex1, TRUE)
l1 <- list(a=c(2,3), b=NULL, c=NULL) # directly using new() treeharp1 <- new("TreeHarp", adjList = l1, nodeTypes = NA) # using one of the constructor methods (for lists) treeharp2 <- TreeHarp(l1) # using the constructor for matrices. m1 <- matrix(0L, 3, 3) dimnames(m1) <- list(letters[1:3], letters[1:3]) m1[1, ] <- c(0, 1L, 1L) m1[, 1] <- c(0, 1L, 1L) treeharp3 <- TreeHarp(m1) # Supplying a language object to get the same tree (with nodeTypes # populated) ex1 <- quote(a(b,c)) TreeHarp(ex1, TRUE)
Updates the adjacency list for an R expression parse tree.
update_adj_list( update_type = c("new_node", "add_child"), node_id, node_name, child_node, env_ni )
update_adj_list( update_type = c("new_node", "add_child"), node_id, node_name, child_node, env_ni )
update_type |
This should be either "new_node" or "add_child". If it is a new node, an empty list component is added. If it is add child, then child_node should be provided too. |
node_id |
An integer. |
node_name |
The name of the new node to be added. This must be provided if the update_type is "new_node". |
child_node |
An integer. |
env_ni |
An environment object, possibly containing an adjacency list that will later be used to construct a TreeHarp object. |
This is for internal use. It may be removed from user-view soon!
An invisible TRUE is returned.