| Title: | Optimal Assignment of Students to Groups |
|---|---|
| Description: | Integer programming models to assign students to groups by maximising diversity within groups, or by maximising preference scores for topics. |
| Authors: | Vik Gopal [aut], Kevin Lam [aut], Ju Xue [ctb], Mingyuan Zhang [aut, cre], National University of Singapore [cph] |
| Maintainer: | Mingyuan Zhang <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.6.2 |
| Built: | 2026-05-15 06:32:01 UTC |
| Source: | https://github.com/singator/grouper |
From the result of ompr::solve_model(), this function attaches the
derived groupings to the original dataframe comprising students.
assign_groups( model_result, assignment = c("diversity", "preference"), dframe, params_list, group_names )assign_groups( model_result, assignment = c("diversity", "preference"), dframe, params_list, group_names )
model_result |
The output solution objection. |
assignment |
Character string indicating the type of model that this dataset is for. The argument is either 'preference' or 'diversity'. Partial matching is fine. |
dframe |
The original dataframe used in |
params_list |
The list of parameters from the YAML file, i.e. the output
of |
group_names |
A character string. It denotes the column name in the original dataframe containing the self-formed groups. Note that we need the string here, not the integer position, since we are going to join with it. |
A data frame with the group assignments attached to the original group composition dataframe.
Creates one row per student and one column per course-role pair, with units allocated by the solver.
assign_job(model_result, student_df, course_codes, name_col = "Name")assign_job(model_result, student_df, course_codes, name_col = "Name")
model_result |
Result object from |
student_df |
A data frame that contains student name information. Every row is a unique student. |
course_codes |
Character vector of course codes in the same order as
|
name_col |
Student name column name in |
A data frame with columns:
Name, then all <course>-t, all <course>-g, all <course>-e.
An example dataset to use with the diversity-based assignment model.
dba_gc_ex001dba_gc_ex001
dba_gc_ex001A data frame with 4 rows and 4 columns.
id: the student id of each students, simply the integers 1 to 4.
major: the primary major of each student.
skill: the skill level of each student.
groups: the self-formed groups submitted by each student. In this case, student is in his/her own group.
This dataset was constructed by hand.
Wrapper around extract_student_info() and extract_phd_info().
extract_info(assignment = c("diversity", "preference", "phd"), ...)extract_info(assignment = c("diversity", "preference", "phd"), ...)
assignment |
Character string indicating model type. Must be one of
|
... |
Additional arguments for the underlying extraction functions. See Details. |
Explicit argument guide by assignment:
For assignment = "diversity", extract_info() forwards ... to
extract_student_info().
Required arguments:
dframe
self_formed_groups
either:
d_mat, or
demographic_cols, so Gower dissimilarity is computed internally
Optional arguments:
skills, which can be supplied or set to NULL
For assignment = "preference", extract_info() forwards ... to
extract_student_info().
Required arguments:
dframe
self_formed_groups
pref_mat
For assignment = "phd", extract_info() forwards ... to
extract_phd_info().
Required arguments:
student_df
p_mat
d_mat
Optional arguments:
e_mode, which uses the default from extract_phd_info()
C, which uses the default from extract_phd_info()
This wrapper does not parse YAML files. YAML-based parameter extraction
remains available via extract_params_yaml().
A model input list from extract_student_info() or
extract_phd_info().
The remaining parameters for the models are retrieved from a YAML file, so as
not to clutter the argument list for extract_student_info().
extract_params_yaml(fname, assignment = c("diversity", "preference"))extract_params_yaml(fname, assignment = c("diversity", "preference"))
fname |
A YAML file containing the remaining parameters. |
assignment |
Character string indicating the type of model that this dataset is for. The argument is either 'preference' or 'diversity'. Partial matching is fine. |
For the diversity+skill-based assignment, this function returns a list containing:
n_topics: the number of topics
R: the optimally desired number of repetitions per topic
nmin: the minimum number of students per topic,
nmax: the maximum number of students per topic,
rmin: the minimum number of repetitions per topic,
rmax: the maximum number of repetitions per topic.
For the preference-based assignment, this function returns a list containing:
n_topics: the number of topics
R: the optimally desired number of repetitions per topic
nmin: the minimum number of students per topic,
nmax: the maximum number of students per topic,
rmin: the minimum number of repetitions per topic,
rmax: the maximum number of repetitions per topic.
Converts student-level data and input matrices into the list expected by
prepare_phd_model().
extract_phd_info(student_df, p_mat, d_mat, e_mode = c("rr", "none"), C = 4)extract_phd_info(student_df, p_mat, d_mat, e_mode = c("rr", "none"), C = 4)
student_df |
A data frame with one row per student. Required columns are:
|
p_mat |
Preference matrix with dimensions |
d_mat |
Demand matrix with dimensions |
e_mode |
How to handle E demand when |
C |
Semester workload capacity per student. Used when |
This function assumes input order is already aligned:
student_df row i corresponds to P[i, ], s[i], t1[i], and g1[i].
d_mat row j corresponds to P[, j].
If E is computed (e_mode = "rr"), total E is set to:
Ns * C - sum(TA) - sum(GR).
A list containing:
Ns: number of students
Nj: number of courses
P: preference matrix (Ns x Nj)
d: demand matrix (Nj x 3) with columns TA, GR, E
s: seniority vector (year - 2)
t1: past TA workload vector
g1: past GR workload vector
Converts a dataframe with information on students to a list of parameters. This
list forms one half of the inputs to prepare_model(). The remaining model
parameters can come from extract_params_yaml() or be supplied directly to
prepare_model() for non-YAML workflows.
extract_student_info( dframe, assignment = c("diversity", "preference"), self_formed_groups, demographic_cols, skills, pref_mat, d_mat )extract_student_info( dframe, assignment = c("diversity", "preference"), self_formed_groups, demographic_cols, skills, pref_mat, d_mat )
dframe |
A dataframe with one row for each student. The columns could possibly contain demographic variables, an overall skill measure, and a column indicating self-formed groups. It is best to have an id column to identify each student. |
assignment |
Character string indicating the type of model that this dataset is for. The argument is either 'preference' or 'diversity'. Partial matching is fine. |
self_formed_groups |
An integer column that identifies the self-formed groups, submitted by students. |
demographic_cols |
A set of integers indicating the columns corresponding to demographic information, e.g. major, year of study, gender, etc. This argument is only used by the diversity-based assignment. |
skills |
A numeric measure of overall skill level (higher means more skilled). This argument is only used by the diversity-based assignment. This argument can be set to NULL. If this is done, then the model used only maximises the diversity. |
pref_mat |
The preference matrix with dimensions equal to the num of groups x B*T, where T is the number of topics and B is the number of sub-groups per topic. This argument is only used in the preference-based assignment. See the Details section for more information. |
d_mat |
The dissimilarity matrix with number of rows equal to the number of students. This matrix should be symmetric, with diagonals equal to 0. This argument is only used in the diversity-based assignment. If it is not provided, the "Gower" distance from the cluster package is used. If this is provided, then demographic_cols is ignored. |
For the diversity-based assignment, the demographic variables are converted
into an NxN dissimilarity matrix. By default, the dissimilarity metric used
is the Gower distance cluster::daisy().
For the preference-based assignment, the preference matrix indicates the preference that each group has for the project topics. For this model, each topic has possibly B sub-groups. The number of columns of this matrix must be B*T. Suppose there are T=3 topics and B=2 sub-groups per topic. Then the order of the sub-topics should be:
T1S1, T2S1, T3S1, T1S2, T2S2, and T3S2.
Note that higher values in the preference matrix reflect a greater preference for a particular topic-subtopic combination, since the objective function is set to be maximised.
For the diversity-based assignment model, this function returns a list containing:
N: number of students
G: number of self-formed groups
m: a (student x groups) matrix, indicating group membership for each student.
d: dissimilarity matrix, NxN
s: skills vector for each individual student (possibly NULL)
For the preference-based assignment model, this function returns a list containing:
N: number of students
G: number of self-formed groups
m: a (student x groups) matrix, indicating group membership for each student.
n: a vector of length G, with the number of students in each self-formed group.
p: The preference matrix from the input argument.
An example dataset to use with the preference-based assignment model.
pba_gc_ex002pba_gc_ex002
pba_gc_ex002A data frame with 8 rows and 2 columns.
id: the student id of each students, simply the integers 1 to 8.
grouping: the self-formed groups submitted by each student. In this case, each self-formed group is of size 2.
This dataset was constructed by hand.
An example dataset to use with the preference-based assignment model.
pba_prefmat_ex002pba_prefmat_ex002
pba_prefmat_ex002A matrix with 4 rows and 4 columns
Each row represents the preferences of each self-formed group in the
dataset pba_gc_ex002.
This dataset was constructed by hand.
An example demand matrix to use with the PhD workload allocation model.
phd_demand_ex001phd_demand_ex001
phd_demand_ex001A matrix with 4 rows and 2 columns.
Columns are in the order TA, GR. Row names store the course codes.
This dataset was constructed by hand.
An example preference matrix to use with the PhD workload allocation model.
phd_prefmat_ex001phd_prefmat_ex001
phd_prefmat_ex001A matrix with 4 rows and 4 columns.
Rows correspond to students in phd_students_ex001, and columns correspond
to rows of phd_demand_ex001.
Preference scores are encoded as 3 (first choice), 2 (second choice), and 1 (third choice). Unranked courses are encoded as -99.
This dataset was constructed by hand.
An example student table to use with the PhD workload allocation model.
phd_students_ex001phd_students_ex001
phd_students_ex001A data frame with 4 rows and 5 columns.
student_id: unique student id.
year: PhD year, encoded from 1 to 4.
past_ta: previous-semester TA workload units.
past_gr: previous-semester GR workload units.
Name: student name.
In this toy dataset, past_ta + past_gr = 4 for every student.
For a one-semester sanity-check variant, reuse this dataset with
past_ta = 0 and past_gr = C for all students before extraction.
This dataset was constructed by hand.
Prepare the diversity-based assignment model
prepare_diversity_model(df_list, yaml_list, w1 = 0.5, w2 = 0.5)prepare_diversity_model(df_list, yaml_list, w1 = 0.5, w2 = 0.5)
df_list |
The output list from |
yaml_list |
The output list from |
w1, w2
|
Numeric values between 0 and 1. Should sum to 1. These weights correspond to the importance given to the diversity- and skill-based portions in the objective function. |
An ompr model.
Initialise optimisation model (wrapper)
prepare_model( df_list, yaml_list = NULL, assignment = c("diversity", "preference", "phd"), w1 = 0.5, w2 = 0.5, ... )prepare_model( df_list, yaml_list = NULL, assignment = c("diversity", "preference", "phd"), w1 = 0.5, w2 = 0.5, ... )
df_list |
Model input list. |
yaml_list |
Parameter list from |
assignment |
Character string indicating model type. Must be one of
|
w1, w2
|
Numeric values between 0 and 1. Should sum to 1. Used only for
|
... |
Additional arguments:
|
An ompr model.
Builds a mixed-integer optimisation model for assigning TA, GR, and E units across students and courses.
prepare_phd_model( df_list, t_max_y1 = 1, e_max = NULL, ta_min = NULL, ta_max = NULL, gr_min = NULL, gr_max = NULL, e_min = NULL, alpha = 2, beta = 1, phi = 1, rho = 10, C = 4 )prepare_phd_model( df_list, t_max_y1 = 1, e_max = NULL, ta_min = NULL, ta_max = NULL, gr_min = NULL, gr_max = NULL, e_min = NULL, alpha = 2, beta = 1, phi = 1, rho = 10, C = 4 )
df_list |
A list of model inputs, typically from
|
t_max_y1 |
Maximum current-semester TA load for Year-1 students
( |
e_max |
Optional upper bound on per-student E units in current semester. |
ta_min, ta_max
|
Optional lower/upper bounds on per-student TA units in current semester. |
gr_min, gr_max
|
Optional lower/upper bounds on per-student GR units in current semester. |
e_min |
Optional lower bound on per-student E units in current semester. |
alpha |
Objective weight on TA spread |
beta |
Objective weight on TA preference term. |
phi |
Objective weight on seniority-weighted E term. |
rho |
Objective weight on Year-1 TA slack penalties. |
C |
Semester workload capacity per student. The model fixes annual
workload at |
Index alignment is critical: P[i, j], d[j, ], s[i],
t1[i], and g1[i] must refer to the same student/course ordering.
An ompr model object ready for ompr::solve_model().
Prepare the preference-based assignment model
prepare_preference_model(df_list, yaml_list)prepare_preference_model(df_list, yaml_list)
df_list |
The output list from |
yaml_list |
The output list from |
An ompr model.