Correlation and Regression SSC CGL Tier 2 Paper 2

In the SSC CGL Tier 2 for SSC JSO paper, Correlation and Regression is an important topic under Statistics. These concepts help to understand the relationship between two or more variables, how one variable changes when another changes.

1. What is Correlation?

Definition

Correlation measures the degree and direction of relationship between two variables. If two variables change together either in the same or opposite direction they are said to be correlated.

Type of CorrelationDirectionExample
Positive CorrelationBoth variables move in the same direction.Height and Weight – as height increases, weight increases.
Negative CorrelationVariables move in opposite directions.Price and Demand – as price increases, demand decreases.
Zero CorrelationNo relationship between variables.Shoe size and intelligence.

Also check out Most Repeated Quantitative Aptitude Questions for SSC CGL Tier 2

2. Methods to Study Correlation

(a) Scatter Diagram (Graphical Method)

  • It is a simple visual method to show correlation between two variables.
  • Values of one variable are plotted on the X-axis and the other on the Y-axis.
PatternType of CorrelationDiagram Description
Points lie close to an upward-sloping straight linePositive CorrelationBoth variables increase together
Points lie close to a downward-sloping straight lineNegative CorrelationOne increases, the other decreases
Points scattered randomlyNo CorrelationNo visible relationship

(b) Karl Pearson’s Coefficient of Correlation (r)

It gives a quantitative measure of correlation between two variables. r=Σ(x−xˉ)(y−yˉ)Σ(x−xˉ)2Σ(y−yˉ)2r = \frac{\Sigma (x – \bar{x})(y – \bar{y})}{\sqrt{\Sigma (x – \bar{x})^2 \Sigma (y – \bar{y})^2}}r=Σ(x−xˉ)2Σ(y−yˉ​)2​Σ(x−xˉ)(y−yˉ​)​

Where:

  • x,yx, yx,y = Variables
  • xˉ,yˉ\bar{x}, \bar{y}xˉ,yˉ​ = Mean of x and y
  • rrr ranges between −1 and +1
Value of rInterpretation
+1Perfect positive correlation
−1Perfect negative correlation
0No correlation

The closer |r| is to 1, the stronger the relationship.

Also check out Most Repeated Quantitative Aptitude Questions for SSC CGL Tier 2

(c) Spearman’s Rank Correlation (rₛ)

Used when data are in ranks or qualitative form (like preference, performance, etc.). rs=1−6Σd2n(n2−1)r_s = 1 – \frac{6 \Sigma d^2}{n(n^2 – 1)}rs​=1−n(n2−1)6Σd2​

Where:

  • ddd = Difference between ranks of each pair
  • nnn = Number of observations
rₛ ValueMeaning
+1Perfect positive rank correlation
−1Perfect negative rank correlation
0No rank correlation

Example:
If 5 students’ marks in Maths and English are ranked and Σd2=10\Sigma d^2 = 10Σd2=10, rs=1−6(10)5(52−1)=1−60120=0.5r_s = 1 – \frac{6(10)}{5(5^2 – 1)} = 1 – \frac{60}{120} = 0.5rs​=1−5(52−1)6(10)​=1−12060​=0.5

→ Moderate positive correlation.

Check out Most Repeated Reasoning Questions for SSC CGL Tier 2

3. What is Regression?

Definition

Regression shows the functional relationship between two variables, it helps predict the value of one variable based on another.

ConceptExplanation
Dependent Variable (Y)The variable to be predicted
Independent Variable (X)The variable used for prediction

Check out Most Repeated Computer Awareness Questions for SSC CGL Tier 2

4. Regression Lines

There are two regression lines:

  1. Regression Line of Y on X: Y−Yˉ=byx(X−Xˉ)Y – \bar{Y} = b_{yx}(X – \bar{X})Y−Yˉ=byx​(X−Xˉ)
  2. Regression Line of X on Y: X−Xˉ=bxy(Y−Yˉ)X – \bar{X} = b_{xy}(Y – \bar{Y})X−Xˉ=bxy​(Y−Yˉ)

Where,

  • byxb_{yx}byx​ and bxyb_{xy}bxy​ are regression coefficients

Formulas:

byx=r×σyσx,bxy=r×σxσyb_{yx} = r \times \frac{\sigma_y}{\sigma_x}, \quad b_{xy} = r \times \frac{\sigma_x}{\sigma_y}byx​=r×σx​σy​​,bxy​=r×σy​σx​​

PropertyDescription
Both lines intersect at (𝑋̄, 𝑌̄).
byx×bxy=r2b_{yx} \times b_{xy} = r^2byx​×bxy​=r2
If r = 0 → lines are perpendicular.
If r = ±1 → both lines coincide.

Use: Regression helps in forecasting for example, predicting sales based on advertisement spend.

5. Multiple Correlation

Definition

When we study the relationship between one dependent variable and two or more independent variables, it is called multiple correlation.

Example:
Predicting a student’s performance (Y) based on study hours (X₁) and attendance (X₂).

Multiple Correlation Coefficient (R):

R=ryx12+ryx22−2ryx1ryx2rx1x21−rx1x22R = \sqrt{r_{yx1}^2 + r_{yx2}^2 – 2r_{yx1}r_{yx2}r_{x1x2} \over 1 – r_{x1x2}^2}R=1−rx1x22​ryx12​+ryx22​−2ryx1​ryx2​rx1x2​​​

Where:

  • ryx1,ryx2r_{yx1}, r_{yx2}ryx1​,ryx2​ = Correlation of Y with X₁ and X₂
  • rx1x2r_{x1x2}rx1x2​ = Correlation between X₁ and X₂

Range: 0 ≤ R ≤ 1

  • R close to 1 → strong relationship
  • R close to 0 → weak relationship

6. Key Differences Between Correlation and Regression

BasisCorrelationRegression
MeaningMeasures degree of relationship between variablesExpresses the relationship mathematically
PurposeTo find strength & directionTo predict one variable from another
Number of LinesOne (no distinction)Two (Y on X, X on Y)
InterchangeabilityNo dependent or independent variableOne variable is dependent on another
Value Range−1 to +1Any real value

Key Takeaways

Below are the key takeaways:

  • Correlation → Measures relationship strength.
  • Regression → Provides predictive equations.
  • Karl Pearson’s coefficient (r) and Spearman’s rank (rₛ) are most common in exams.
  • Regression lines always pass through means (𝑋̄, 𝑌̄).
  • Multiple correlation deals with more than two variables.
  • Formula shortcuts and properties are frequently asked in Paper II (Statistics).

FAQs on Correlation and Regression


Q1. What is the difference between correlation and regression?

Correlation measures the degree of relationship, while regression shows how one variable predicts another.

Q2. What is the range of the correlation coefficient (r)?

The value of r always lies between −1 and +1.

Q3. What is Spearman’s rank correlation used for?

It is used when data is ranked or qualitative in nature.

Q4. What are regression coefficients?

They represent the rate of change in one variable due to a change in another variable.

Q5. What does multiple correlation indicate?

It shows how a dependent variable is influenced by two or more independent variables.