[Algeb-Stat] Module 2: Linear Regression, Binomial distribution

Date: 2024.05.14 Updated: 2024.05.14

Categories: Algeb-Stat

Tags: Binomial distribution Linear Regression

📋 Here are the notes summarizing what I learned from the course!

The Line of Best Fit (Regression Line)

Definition

Understanding the linear relationship between two variables is essential in statistics, especially in predictive modeling. This topic covers how to find the line of best fit, or regression line, through a set of data points.

Linear Relationship Between Two Variables

The simplest type of relationship between two variables is linear and can be represented by the equation:

\[y = mx + b\]

m: Slope of the line
b: y-intercept
x: Independent variable
y: Dependent variable

The slope (m) indicates the steepness of the line, and the y-intercept (b) is where the line crosses the y-axis.

Constructing a Scatter Plot

Given data points like (1.2, 1), (1.3, 1.6),…, (4, 4.2), the first step to finding the line of best fit is to plot these points on a scatter plot.

Calculating the Line of Best Fit

To find the line that best approximates the data, we use the method of least squares. This method minimizes the sum of the squares of the vertical deviations (residuals) of the points from the line.

Steps to Find the Line of Best Fit:

Plot the Data: Create a scatter plot of the data points.
Calculate the Slope (m) and y-Intercept (b):
- Formula for slope (m): \(m = \frac{n(\sum xy) - (\sum x)(\sum y)}{n(\sum x^2) - (\sum x)^2}\)
- Formula for y-intercept (b): \(b = \frac{\sum y - m(\sum x)}{n}\)
Draw the Line: Using the calculated slope and intercept, draw the line through the data points.

Example Calculation:

Given data points:

x: 1, 2, 3, 4
y: 2, 4, 4, 6

After organizing and calculating using the formulas:

Slope (m) = 1.2
Intercept (b) = 1

The equation of the line of best fit is: \(y = 1.2x + 1\)

Appropriateness of Linear Regression

The appropriateness of using a linear regression model depends on the degree of linear correlation between the variables, which can be measured using the coefficient of correlation (r).

Linear Correlation Coefficient (r)

The coefficient (r) measures the strength and direction of a linear relationship between two variables.

Formula: \(r = \frac{n(\sum xy) - (\sum x)(\sum y)}{\sqrt{[n(\sum x^2) - (\sum x)^2][n(\sum y^2) - (\sum y)^2]}}\)

Properties of r:

r ranges from -1 to 1.
r = 1 indicates a perfect positive linear relationship.
r = -1 indicates a perfect negative linear relationship.
r close to 0 suggests little or no linear correlation.

Coefficient of Determination \((r^2)\)

The coefficient of determination explains the proportion of the variance in the dependent variable that is predictable from the independent variable.

Practice Problems

Scatter Diagram: Plot the relationship between land size (x) and selling price (y).
Line of Best Fit: Find and draw the line of best fit.
Prediction: Estimate the selling price for a 1.7-acre piece of land.
Correlation and Determination: Calculate r and \(r^2\) to assess the strength of the relationship.

Understanding the line of best fit helps in predicting outcomes and understanding the relationship between variables in various fields, from economics to biology.

Discrete Probability Distributions - Binomial Distribution

Overview

Discrete probability distributions provide a way to handle situations where outcomes are discrete (individual countable outcomes). The binomial distribution is one specific type of discrete probability distribution that is useful for modeling scenarios with two possible outcomes (success and failure) over a series of trials.

Introduction to Probability Concepts

Events and Sample Space

Event: A collection of outcomes from an experiment.
- Simple Event: An event with only one outcome (e.g., rolling a 6 with one die).
Sample Space (S): The set of all possible simple events (outcomes). For one die, \(S = \{1, 2, 3, 4, 5, 6\}\).

Probability Values

Impossible Event: Probability = 0.
Certain Event: Probability = 1.
Other Events: Probability ranges between 0 and 1.

Complementary Events

Complement of an Event (A’): Consists of all outcomes where event \(A\) does not occur.
- \[P(A') = 1 - P(A)\]

Classical Approach to Probability

Assumes all outcomes are equally likely. If an event can occur in \(s\) ways out of \(n\) possible ways:

\[P(A) = \frac{s}{n}\]

Random Variable and Discrete Probability Distributions

Random Variable (x): A variable with numerical values determined by the outcome of an experiment.
Discrete Random Variable: Has a finite or countable number of values.
Probability Distribution of a Discrete Random Variable: Lists all possible values of the random variable and their corresponding probabilities.

Example: Probability Distribution

For a student guessing on a 2-question true/false quiz:

Sample Space \(S = \{CW, WC, WW, CC\}\)
Probabilities: \(P(0) = \frac{1}{4}, P(1) = \frac{1}{2}, P(2) = \frac{1}{4}\)

Mean, Variance, and Standard Deviation for a Probability Distribution

Formulas

Mean (μ): \(\mu = \sum [x \cdot P(x)]\)
Variance (σ²): \(\sigma^2 = \sum [(x - \mu)^2 \cdot P(x)]\)
- An alternative formula: \(\sigma^2 = \sum [x^2 \cdot P(x)] - \mu^2\)
Standard Deviation (σ): \(\sigma = \sqrt{\sigma^2}\)

Example Calculation

For a random variable representing the number of girls in a one-child family (assuming equal probability for boy or girl):

\[P(0) = 0.5, P(1) = 0.5\]
Mean: \(\mu = 0.5\)
Variance: \(\sigma^2 = 0.25\)
Standard Deviation: \(\sigma = 0.5\)

The Binomial Distribution

A binomial distribution is suitable for a sequence of \(n\) independent trials, each resulting in a success or a failure, with the probability of success \(p\) being constant across trials.

Properties

Trials: \(n\) independent trials.
Outcomes: Each trial results in success (p) or failure (1-p).
Probability of Success: Constant for each trial (p).

Binomial Probability Formula

\[P(x) = \binom{n}{x} p^x (1-p)^{n-x}\]

Example: Probability of 7 Successes in 10 Attempts with Success Probability of 0.8

\[P(x = 7) = \binom{10}{7} \cdot 0.8^7 \cdot 0.2^3\]

Mean, Variance, and Standard Deviation for the Binomial Distribution

Mean: \(\mu = n \cdot p\)
Variance: \(\sigma^2 = n \cdot p \cdot (1-p)\)
Standard Deviation: \(\sigma = \sqrt{n \cdot p \cdot (1-p)}\)

Practice Problems

Calculate mean, variance, and standard deviation for different probability distributions.
For a coin tossed twice, determine the probability distribution, mean, variance, and expected value.
For a clinical trial with a drug success rate of 8%, calculate the probability of at least 2 successes out of 15 trials.

Understanding these concepts allows for effective modeling of various real-world scenarios where outcomes are discrete and predictable, particularly in fields such as genetics, quality control, and research studies.

Seyeon

[Algeb-Stat] Module 2: Linear Regression, Binomial distribution

The Line of Best Fit (Regression Line)

Definition

Linear Relationship Between Two Variables

Constructing a Scatter Plot

Calculating the Line of Best Fit

Steps to Find the Line of Best Fit:

Example Calculation:

Appropriateness of Linear Regression

Linear Correlation Coefficient (r)

Properties of r:

Coefficient of Determination \((r^2)\)

Practice Problems

Discrete Probability Distributions - Binomial Distribution

Overview

Introduction to Probability Concepts

Events and Sample Space

Probability Values

Complementary Events

Classical Approach to Probability

Random Variable and Discrete Probability Distributions

Example: Probability Distribution

Mean, Variance, and Standard Deviation for a Probability Distribution

Formulas

Example Calculation

The Binomial Distribution

Properties

Binomial Probability Formula

Example: Probability of 7 Successes in 10 Attempts with Success Probability of 0.8

Mean, Variance, and Standard Deviation for the Binomial Distribution

Practice Problems

See other articles in Category Algeb-Stat

Leave a comment

Lastest article 10 :)