Removing raw HTML

This commit is contained in:
Brandon Rozek 2025-02-16 22:04:56 -05:00
parent e06d45e053
commit 572d587b8e
No known key found for this signature in database
GPG key ID: DFB0E78F805F4567
33 changed files with 373 additions and 386 deletions

View file

@ -76,31 +76,31 @@ coefficients of CUST2 and the y-intercept.
##### Checking the Conditions for Inference
Before we conclude with the analysis, we must first check the conditions for inference to see if the technique is appropriate for our data.
<u>Independence Assumption:</u>
**Independence Assumption:**
A houses selling price can depend on anothers so this condition is not met.
<u>Randomization Condition:</u>
**Randomization Condition:**
The dataset is comprised of a random sample of records of resale of homes which satisfies the
randomization condition.
<u>Straight Enough Condition:</u>
**Straight Enough Condition:**
The scatterplot matrix in Figure 20 shows that for the predictors square footage and tax that the
scatterplot is straight enough and doesnt have any bends or curves.
<u>Equal Variance Assumption:</u>
**Equal Variance Assumption:**
The residual analysis in Figure 21 shows that the outliers are not spread equally on the
scatterplot. Therefore, the equal variance assumption is not met.
<u>Nearly Normal Condition:</u>
**Nearly Normal Condition:**
The QQ-Plot in Figure 21 shows that the residuals follow a unimodal and symmetric distribution.
Taking out the outliers in the model also did not introduce any new outliers in the boxplot.
<u>Missing At Random Condition:</u>
**Missing At Random Condition:**
7The discussion in the descriptive statistics section about the missing data tells us that the data
is missing evenly with respect to the different variables. Therefore, it is safe to assume that the
data is missing at random
<u>Multicollinearity Condition:</u>
**Multicollinearity Condition:**
All of the VIF values are lower than 10, therefore this condition is met.
The conditions for inference are not fully met due to the equal variance assumption. This means that our model will be more inaccurate for some price range of homes than others. Looking at our residual analysis, it appears that the inaccuracies happen when the price of the home is higher. There werent many outliers in the dataset (6 out of 117 or 5%) so removing these outliers makes the model more representative to the majority of the houses in the market. Since this model is intended to be used when analyzing prices of homes in the area, it is better not to include the outliers that most people don&#8217;t intend to buy. Since the error term is unimodal and symmetric, we can be at ease that there isnt any other confounding factor in our model. Overall, this is a good model to use for inference and prediction as long as one doesnt use it to describe the outliers.
The conditions for inference are not fully met due to the equal variance assumption. This means that our model will be more inaccurate for some price range of homes than others. Looking at our residual analysis, it appears that the inaccuracies happen when the price of the home is higher. There werent many outliers in the dataset (6 out of 117 or 5%) so removing these outliers makes the model more representative to the majority of the houses in the market. Since this model is intended to be used when analyzing prices of homes in the area, it is better not to include the outliers that most people don't intend to buy. Since the error term is unimodal and symmetric, we can be at ease that there isnt any other confounding factor in our model. Overall, this is a good model to use for inference and prediction as long as one doesnt use it to describe the outliers.
### Conclusion
The multiple imputation model without outliers is the best model outlined in this paper for describing the price of housing in this region. The formula is re-expressed here
PRICE = 76.47917 + 0.64130(TAX) + 0.27290(SQFT) + 77.58816(CUST2)
This states that for every dollar of tax spent on the home, the home increases on average by $64 given the other parameters stay constant. The same concept applies to square footage and custom design. For every square foot added to the home, the value of it increases on average by $27. Having a home with a custom design increases the value of the home by $7700. This model is more reliable the lower the price of the home is. When it comes to high cost homes, the error produced by the model increases. From this model, we conclude that property tax, square footage, and whether or not a home is built from a custom design are the most significant factors in the price of a home in Albuquerque, New Mexico.
This states that for every dollar of tax spent on the home, the home increases on average by $64 given the other parameters stay constant. The same concept applies to square footage and custom design. For every square foot added to the home, the value of it increases on average by $27. Having a home with a custom design increases the value of the home by $7700. This model is more reliable the lower the price of the home is. When it comes to high cost homes, the error produced by the model increases. From this model, we conclude that property tax, square footage, and whether or not a home is built from a custom design are the most significant factors in the price of a home in Albuquerque, New Mexico.