Aligned with GGU/upGrad course material — quick reference for quantitative business research
"Is there a significant difference between groups?"
2 groups → T-test
3+ groups → One-way ANOVA
2+ IVs → Factorial ANOVA
+ covariate → ANCOVA
Multiple DVs → MANOVA / MANCOVA
Non-normal data* → Mann-Whitney / Kruskal-Wallis
"Is there a relationship between variables?"
2 continuous vars → Correlation (Pearson r)
2 nominal vars → Chi-Square
Predict continuous DV → Simple / Multiple Regression
Predict binary DV → Logistic Regression
Complex model + latent vars → SEM / PLS-SEM
Mechanism (why?) → Mediation
Boundary condition (when?) → Moderation
"How do observations group together?"
Segment customers/people → K-means / Hierarchical Clustering
Reduce many variables → Factor Analysis (EFA)
Attribute preference → Conjoint Analysis
Time-dependent forecasting → ARIMA / Time Series
| Method | When / Purpose | IV → DV | Business Example | Key Thresholds & Caveats |
|---|---|---|---|---|
|
T-test
Independent samples
|
Compare means of 2 groups. Course says: "conducted when the researcher has only two groups in the data set" | CAT (2 groups) → CONT |
Compare average test scores between two groups receiving different teaching methods
IV: Teaching method (A/B) → DV: Test scores
|
p < .05 n ≥ 30/group
Assumes normality & equal variances. Use Levene's test to check. If violated, use Welch's t-test.
|
|
Paired T-test *
Dependent samples
|
Compare means from same group at 2 time points (before/after, pre/post-test) | CAT (time) → CONT |
Did employee performance improve before vs after a training program? (pre-test / post-test)
IV: Time (pre/post) → DV: Performance score
|
p < .05
Differences must be ~normally distributed. Key for experimental pre/post designs discussed in the course.
|
|
One-way ANOVA
Analysis of Variance
|
Compare means across 3+ groups. Course: "conducted when the researcher has three or more groups" | CAT (3+ groups) → CONT |
Compare income across 3 age groups (under 30, 30-50, over 50). Or: CCC experiment comparing 4 incentive groups on program attendance
IV: Group (A/B/C/D) → DV: Programs attended
|
F-statistic p < .05
Only tells you "a difference exists" — use post-hoc (Tukey/Bonferroni) to find WHICH groups differ. Check normality + homogeneity of variance.
|
|
Factorial ANOVA
Two-way / Three-way
|
Test effects of 2+ categorical IVs + their interaction on a continuous DV. Course covers 2×2, 3×3, 3×2, 5×7 designs | CAT × CAT → CONT |
Does the effect of ad format (video/image/text) on engagement vary by placement (social/search/email)?
Main effects + interaction effect. Course example: gender × age group on job satisfaction
|
F for main effects + interaction
Interaction effect is the key insight — it reveals when the effect of one IV depends on the level of another. Report eta squared (η²) for effect size. Data must be continuous.
|
|
ANCOVA
Analysis of Covariance
|
Compare group means while controlling for a covariate. Course: "when a control variable needs to be considered for some aspects of a factor" | CAT + CONT covariate → CONT |
Compare sales across price points while controlling for advertising spend
IV: Price group + Covariate: Ad spend → DV: Sales
|
Homogeneity of regression slopes
Covariate must not interact with IV. Covariate must correlate with DV. Linearity between covariate and DV within each group.
|
|
MANOVA
Multivariate ANOVA
|
Compare groups on multiple DVs simultaneously. Course: "used when multiple dependent variables exist" | CAT → Multiple CONT |
Do different training programs affect both accuracy and speed and quality simultaneously?
Course: effect of price, quality, brand on satisfaction AND loyalty
|
Wilks' Λ p < .05
Controls Type I error across multiple DVs. Assumes multivariate normality + homogeneity of covariance matrices. Can use bootstrapping if assumptions violated.
|
|
MANCOVA
|
MANOVA + controlling for covariates. Course: "extended version of ANCOVA that allows multiple DVs" | CAT + CONT covariates → Multiple CONT |
Effect of training programs on productivity while controlling for employee experience and education level |
Same as MANOVA + covariate checks
Combines MANOVA + ANCOVA logic. Requires even larger samples.
|
|
Mann-Whitney U *
Wilcoxon rank-sum
|
Non-parametric alternative to t-test. Compare 2 groups on ordinal/non-normal data | CAT (2 groups) → ORD / non-normal |
Do entrepreneurs vs non-entrepreneurs differ on Likert-scale "experience" rating (1-5)?
IV: Start business? (Y/N) → DV: Likert rating
|
p < .05
No normality needed. Compares rank distributions. Better than chi-square for ordinal data as it preserves order information.
|
|
Kruskal-Wallis *
Non-parametric ANOVA
|
Non-parametric alternative to one-way ANOVA for 3+ groups on ordinal/non-normal data | CAT (3+ groups) → ORD / non-normal |
Do Likert culture ratings differ across 4 departments? |
p < .05
Follow up with Dunn's post-hoc + Bonferroni correction. Use when ANOVA assumptions are violated.
|
| Method | When / Purpose | Variable Types | Business Example | Key Thresholds & Caveats |
|---|---|---|---|---|
|
Correlation
Pearson r
|
Find general association between 2 continuous variables. Course: "measures the strength and direction of the linear relationship" | CONT ↔ CONT |
Relationship between advertising expenditures and sales revenue
r ranges from −1 to +1. Positive = both increase together. Negative = one increases, other decreases.
|
|r| .1=small .3=med .5=large
Only detects LINEAR relationships. Correlation ≠ causation. Both vars must be ~normal. Sensitive to outliers.
|
|
Spearman ρ *
Rank correlation
|
Non-parametric correlation for ordinal or non-normal data | ORD ↔ ORD or non-normal CONT |
Relationship between employee rank and satisfaction rating (Likert scale) |
Same benchmarks as Pearson
No normality needed. Detects monotonic (not just linear) relationships. Use for Likert scales.
|
|
Chi-Square
χ² test of independence
|
Find association between 2 nominal variables. Course: "measures the significance of the association between two variables by comparing observed with expected frequencies" | CAT ↔ CAT |
Is customer demographic (age group) associated with product type purchased? Or: CCC group (A/B/C/D) × Renewal (yes/no)
Cross-tabulation table. Compare observed vs expected counts.
|
p < .05 Expected freq ≥ 5
Does NOT show direction or strength — report Cramér's V for effect size. Use Fisher's Exact if expected < 5. Loses info on ordinal data.
|
|
Simple Regression
|
Predict DV from 1 IV + establish magnitude + direction. Course: "analyzes the relationship between one dependent variable and a single independent variable" | CONT → CONT |
Predict sales based on temperature (ice cream example from course)
Y = β₀ + β₁X. β₁ positive = positive relationship
|
R² p < .05
Check: linearity, normality of residuals, homoscedasticity. R² = % of variance explained. β sign tells direction.
|
|
Multiple Regression
OLS regression
|
Predict DV from multiple IVs. Course: "examines the relationship between one dependent and multiple independent variables" | MIX (multiple) → CONT |
Predict GPA from hours studying, attendance, and extracurricular activities
Y = β₀ + β₁X₁ + β₂X₂ + ... Each β shows unique contribution
|
VIF < 5 R²
Check multicollinearity (VIF), residual normality, homoscedasticity. Report β weights for relative importance. Rule of thumb: n ≥ 50 + 8k (k = IVs).
|
|
Multivariate Regression
|
Multiple IVs predicting multiple DVs. Course distinguishes this from "multiple regression" | MIX → Multiple CONT |
Predict cardiovascular disease, diabetes, AND hypertension from diet, exercise, and stress simultaneously
Course: combine DV items into composite OR test separately
|
R² per DV
Different from "multiple regression" (1 DV). Can create composite DV by averaging items, or analyze DVs separately.
|
|
Hierarchical Regression *
|
Enter IVs in theoretically-ordered blocks to test incremental contribution | MIX (blocks) → CONT |
Block 1: Demographics (age, gender) → Block 2: Job factors (tenure, role). Does Block 2 add explanatory power?
ΔR² shows how much EXTRA variance each block explains
|
ΔR² sig at p < .05
Block order must be theory-driven. Very common in management/OB research. Addresses "above and beyond" questions.
|
|
Logistic Regression
|
Predict a binary outcome. Course: "dependent variable is binary (win/lose, yes/no, success/failure)" | MIX → BIN |
Predict whether employee will leave company (yes/no) from age, salary, satisfaction, tenure, commute
S-shaped logistic curve maps to probability 0–1. Course uses R Commander.
|
Odds Ratio (OR) AIC (lower = better)
OR > 1 = increases odds, OR < 1 = decreases odds. No normality assumption. Check Hosmer-Lemeshow fit. n ≥ ~10 events per predictor.
|
| Method | When / Purpose | Variable Setup | Business Example (from course) | Key Thresholds & Caveats |
|---|---|---|---|---|
|
Mediation
"Why / How?"
|
A mediator explains the mechanism by which IV affects DV. Course: "a middle construct explains the relationship between variables" | IV → MEDIATOR → DV |
Employee morale → Work-life balance → Productivity. The effect of morale on productivity is partially explained by work-life balance (ABC Solutions example)
Also: advertising → brand awareness → sales
|
Indirect effect CI excludes 0
Use bootstrapping (5000+ samples) for indirect effect. Course warns: "do not start research with a mediating variable" — establish direct effect first. Used in "mature studies."
|
|
Moderation
"When / For whom?"
|
A moderator changes the strength/direction of the IV→DV relationship. Course: "the relationship is strong or weak for different groups" | IV × MODERATOR → DV |
Weekly parties → Employee engagement, moderated by Age. Younger employees show stronger effect than older ones (XYZ Corp example)
Also: engagement → satisfaction, moderated by compensation level
|
Interaction term p < .05
Mean-center variables before creating interaction. Use simple slopes to probe. Course: "requires larger sample size and statistical power than simple regression." Plan in advance.
|
| Method | When / Purpose | Data Setup | Business Example | Key Thresholds & Caveats |
|---|---|---|---|---|
|
Factor Analysis
EFA / CFA
|
Data reduction — reduce many variables into manageable constructs. Course: "a multivariate technique... creating a mathematical model to identify patterns" | Many CONT / ORD items → Latent factors |
Reduce variables (price, quality, brand, advertising) into key constructs driving consumer behavior
Course: marketing team condensing variables into "brand reputation" + "perceived quality"
|
Factor loading > 0.707 Eigenvalue > 1
Course emphasis: loading ≥ 0.707 means ≥50% variance captured → keep. Below 0.707 → revise/remove. n ≥ 5-10 per item. Use EFA first, CFA to confirm (not on same data).
|
|
SEM
Structural Equation Modeling
|
Test complex models with latent variables, multiple paths simultaneously. Course: "performed when we have multiple constructs, multiple IVs, and multiple DVs" | MIX (latent + observed) → MIX |
Full model: Leadership → PS → Performance, with PD moderating (your DBA801 paper). Course uses SmartPLS
Inner model (paths) + Outer model (indicators)
|
AVE > .50 VIF < 3 R²
Course steps: (1) Data quality + Cronbach α, (2) Estimate model, (3) Inner/outer model, (4) Reliability + AVE, (5) Discriminant validity (Fornell-Larcker), (6) VIF for collinearity, (7) Path coefficients, (8) R².
|
|
Conjoint Analysis
|
Determine relative importance of product attributes. Course: "used to determine the relative importance of different attributes of a product or service" | Attribute profiles → Preference ranking |
How much do customers value screen size vs battery life vs camera vs price vs brand for a phone?
Steps: select attributes → orthogonal design → rank profiles → utility function
|
Utility scores per attribute
Orthogonal design ensures attributes aren't correlated. Commonly used in product development and pricing research.
|
|
Cluster Analysis
|
Group similar observations into segments. Course: "group data based on similar characteristics or behavior" | Multiple CONT / MIX → Segments |
Segment customers by purchasing behavior. Course: growing importance in ML and AI
K-means or Hierarchical clustering in R Commander
|
Silhouette > .5 good
No single "correct" solution. Standardize variables first. Different from factor analysis: FA groups variables, Cluster groups observations.
|
|
Time Series / ARIMA
|
Forecast values over time. Course: "ARIMA used for time series data forecasting; simple regression is NOT suitable for such data" | Time-ordered CONT → Future values |
Forecast monthly sales for next 12 months. Stock prices, weather patterns, economic indicators
Components: Autoregression + Differencing + Moving Averages
|
AIC for model selection
Handles trends, seasonality, autocorrelation that regular regression cannot. Course distinguishes cross-sectional vs panel data. Fixed effects vs random effects for panel data.
|
| Measure | What It Tests | Business Example | Key Thresholds & Notes |
|---|---|---|---|
Cronbach's Alpha |
Internal consistency — do items in a scale consistently measure the same thing? Course: "evaluates the degree to which all items on a test or scale are related" |
Do 5 items measuring "job satisfaction" yield consistent responses?
Course: expressed as a number between 0 and 1
|
α ≥ 0.707 strong α > .95 suspicious
Course uses the 0.707 threshold (same as factor loading). α > .95 may mean item redundancy. Report "if item deleted" values.
|
Test-Retest Reliability |
Stability over time — same test, same people, different times. Course: "consistency between two measurements conducted under different circumstances" | Administer employee performance test twice under different conditions → consistent results? |
r > .70 acceptable
High correlation between test 1 and test 2 = reliable measure.
|
Interrater Reliability |
Agreement between raters. Course: "measures agreement between raters rating the same phenomenon" | Multiple interviewers rate same candidate — do they agree? |
Cohen's κ > .60
High agreement = reliable. Low = need rater training or clearer criteria.
|
Face Validity |
Does the measure look legitimate on surface? Course: "how genuine a result appears based solely on its appearance" | Ask potential customers to rate a new logo's appearance on 1-10 |
Subjective assessment. Weakest form of validity — necessary but not sufficient. No technical methods required.
|
Internal Validity |
Can we establish cause and effect? Course: "whether a certain intervention can produce the intended outcome" | Training program → improved sales? Was it the training or other factors? |
Course threats: history/maturation, measurement error, regression to mean, attrition, environmental factors. Controlled by randomization + control groups.
|
External Validity |
Generalizability. Course: "extent to which research findings can be extended to a larger, more diverse population" | Can results from 200 employees at one company apply to all companies? |
Requires representative sampling. Consider: sample demographics, setting similarity, time period.
|
AVE Average Variance Extracted |
Convergent validity in SEM — do items capture enough variance of the construct? | Do PS scale items converge on the "psychological safety" construct? |
AVE ≥ .50
Used in SEM/PLS-SEM. AVE < .50 = more error than signal. Course mentions this in SmartPLS steps.
|
Discriminant Validity Fornell-Larcker |
Are constructs truly distinct from each other? | Are "psychological safety" and "trust" measuring different things? |
√AVE > inter-construct r
Course (SEM section): compare √AVE of each construct against cross-loadings and squared correlations. HTMT < .85 is a better modern alternative*.
|
Course framework reminder — the 3 model groups: (A) Test of Differences = T-test, ANOVA family · (B) Test of Relationships = Correlation, Chi-square, Regression, Logistic Regression, SEM · (C) Cluster Model = Cluster Analysis, Factor Analysis
Key concepts from course: True Score Theory: X = T + Er + Es (observed = true score + random error + systematic error). Reduce systematic error via literature review, training collectors, pilot tests, multiple measures.
Sample size rules of thumb: n ≥ 30 per group for parametric tests. Regression: n ≥ 50 + 8k (k = predictors). SEM: n ≥ 200. PLS-SEM: 10× max paths pointing at any construct.
Effect sizes matter more than p-values. Course: "Simply looking at a low p-value may not necessarily indicate meaningful results." Also: "Big data sets make p-values useless as almost every model will be significant."
Type I error = reject true H₀ (false positive). Type II error = fail to reject false H₀ (missed opportunity). α = .05 means 5% chance of Type I error.
The Likert debate: Single Likert item = ordinal → non-parametric tests (Mann-Whitney*, Spearman*). Mean of multiple Likert items = often treated as continuous → parametric OK (Norman, 2010).
(*) Methods not in DBA801 material but widely used: Paired t-test, Spearman ρ, Mann-Whitney U, Kruskal-Wallis, Hierarchical Regression. These are standard in published business research — safe to reference but be transparent about sourcing.
Course tools: R Commander (ANOVA, Regression, Logistic Regression, Factor Analysis, Cluster) + SmartPLS (SEM) + Qualtrics (Survey design) + Excel/SPSS (Data entry)