(************** Content-type: application/mathematica ************** Mathematica-Compatible Notebook This notebook can be used with any Mathematica-compatible application, such as Mathematica, MathReader or Publicon. The data for the notebook starts with the line containing stars above. To get the notebook into a Mathematica-compatible application, do one of the following: * Save the data starting with the line of stars above into a file with a name ending in .nb, then open the file inside the application; * Copy the data starting with the line of stars above to the clipboard, then use the Paste menu command inside the application. Data for notebooks contains only printable 7-bit ASCII and can be sent directly in email or through ftp in text mode. Newlines can be CR, LF or CRLF (Unix, Macintosh or MS-DOS style). NOTE: If you modify the data for this notebook not in a Mathematica- compatible application, you must delete the line below containing the word CacheID, otherwise Mathematica-compatible applications may try to use invalid cache data. For more information on notebooks and Mathematica-compatible applications, contact Wolfram Research: web: http://www.wolfram.com email: info@wolfram.com phone: +1-217-398-0700 (U.S.) Notebook reader applications are available free of charge from Wolfram Research. *******************************************************************) (*CacheID: 232*) (*NotebookFileLineBreakTest NotebookFileLineBreakTest*) (*NotebookOptionsPosition[ 33477, 889]*) (*NotebookOutlinePosition[ 34247, 915]*) (* CellTagsIndexPosition[ 34203, 911]*) (*WindowFrame->Normal*) Notebook[{ Cell[CellGroupData[{ Cell["\<\ Looking for Patterns and Applying the Method of Least Squares to Real Data\ \>", "Title"], Cell[TextData[{ "Go through all parts of this lab, answering all the questions in", StyleBox[" red", FontColor->RGBColor[1, 0, 0]], " and analyzing both sets of data in the ", StyleBox["You Try It ", FontColor->RGBColor[1, 0, 0]], "portion. Your writeup must be in paragraph form with graphs included to \ explain your conclusions." }], "Text"], Cell[CellGroupData[{ Cell["Introduction", "Section"], Cell["\<\ We look for patterns in real data sets including medical, socio-economic, \ meteorological, and pollution statistics. Does the least squares method apply \ only to finding lines of best fit or could you apply the technique from \ scratch to any curve you wish to fit to a set of data points? We begin by \ analyzing data to which standard fit functions might apply and then utilize \ the calculus definition of best fit to find a non-standard fit function. \ \>", "Text"] }, Closed]], Cell[CellGroupData[{ Cell["Part I - Time Series Data ", "Section"], Cell[CellGroupData[{ Cell["\<\ DATA SETS 1 and 2: Thirty-Two Years of Salaries - Men Versus Women \ \>", "Subsection"], Cell[TextData[{ StyleBox[ "The following data represents the mean income for men and women in the \ United States from 1967 through 1998. Ages included people 15 years old and \ over beginning with March 1980, and people 14 years old and over as of March \ of the following year for previous years. All income is in 1998 CPI-U \ adjusted dollars.\nSOURCE: March Current Population Survey\n", FontFamily->"Times New Roman"], StyleBox[ "PREPARED BY:\nIncome Statistics Branch/HHES Division, U.S. Bureau of the \ Census, U.S. Department of Commerce\nWashington, D.C. 20233-8500, (301) \ 457-3242", FontFamily->"Times New Roman", FontSize->10], StyleBox["\n", FontFamily->"Courier New", FontSize->10], StyleBox["Last Revised: ", FontFamily->"Courier New", FontSize->10, FontWeight->"Bold"], StyleBox["Wednesday, 10-Nov-99 09:01:21", FontFamily->"Courier New", FontSize->10, FontSlant->"Italic"] }], "Text"], Cell[BoxData[{ \(Off[General::spell]\ \), \(Off[General::spell1]\ \), \(menearnings = {"\", 27232, 28209, 29338, 29290, 29389, 31214, 31587, 30495, 29784, 30168, 30635, 31180, 31039, 29916, 29419, 29180, 29182, 30027, 30805, 31956, 32210, 32635, 33324, 31978, 31074, 30670, 32143, 32887, 33126, 33553, 34794, 36315}; \n womenearnings = {"\", 11500, 11631, 11997, 12195, 12412, 12930, 12918, 12868, 12889, 13172, 13437, 13313, 13070, 13207, 13253, 13758, 14148, 14805, 15174, 15729, 16301, 16703, 17119, 17085, 17027, 17070, 17506, 17846, 18183, 18790, 19511, 20462}; \n years = {Years, 1967, 1968, 1969, 1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998}; \ndata = Table[{years[\([i]\)], menearnings[\([i]\)], womenearnings[\([i]\)]}, { i, 1, Length[years]}] // TableForm\)}], "Input"], Cell["\<\ As was probably expected, men's mean income exceeds women's mean income, but \ we can get a better perspective if we picture this data.\ \>", "Text"], Cell[BoxData[ \(\(men = Table[data[\([1, i, 2]\)], {i, 2, Length[years]}]; \n women = Table[data[\([1, i, 3]\)], {i, 2, Length[years]}]; \n pmen := ListPlot[men, PlotJoined \[Rule] True, PlotStyle -> RGBColor[0, 1, 0], DisplayFunction -> Identity, AxesLabel -> {"\", "\"}, AxesOrigin -> {0, 10000}]; \n pwomen := ListPlot[women, PlotJoined \[Rule] True, PlotStyle -> RGBColor[0, 0, 1], DisplayFunction -> Identity, AxesLabel -> {"\", "\"}, AxesOrigin -> {0, 10000}]; \n Show[pmen, pwomen, DisplayFunction -> $DisplayFunction]; \n\)\)], "Input"], Cell[TextData[{ "Neither of these sets of data follows an exact straight line, but, both \ have an upward trend, and one way we can examine those trends is to compute \ the best fit lines for each set of data. We will do this using the Fit \ function in ", StyleBox["Mathematica", FontSlant->"Italic"], ". Note that you choose a linear fit by specifying the {x,1}." }], "Text"], Cell[BoxData[{ \(Print["\", fmen = Fit[men, {x, 1}, x], "\< and\>"]\), \(Print["\", fwomen = Fit[women, {x, 1}, x], "\<,\>"]\), \(Print["\"]\), \(fitplots = Plot[{fmen, fwomen}, {x, 0, 33}, PlotStyle -> {RGBColor[0, 1, 0], RGBColor[0, 0, 1]}, DisplayFunction -> Identity]; \n Show[pmen, pwomen, fitplots, DisplayFunction -> $DisplayFunction]; \n \)}], "Input"], Cell["\<\ What do these lines tell you about the rate at which salaries are increasing \ for men versus women?You could verify that these lines would intersect in \ about 183 years.Why should this be considered a worthless estimate?\ \>", "Text", FontColor->RGBColor[1, 0, 0]] }, Closed]], Cell[CellGroupData[{ Cell["DATA SETS 3 and 4: Lake Pollution Patterns", "Subsection"], Cell["\<\ The following data sets represent weekly readings of pollutant levels \ (aluminum and other) in Lake Erie over a two-year period. What observations \ can you draw about the amounts of aluminum and the other pollutants? How does \ the pattern of pollution differ for the aluminum versus the other? The first \ set of data represents the data in the order that it was recorded. We then \ order that data to better analyze the distribution of the amounts of \ pollutant.\ \>", "Text"], Cell[BoxData[{ \(\(aluminum = {36.2846, 35.1909, 38.5753, 29.8268, 32.7359, 33.4072, 31.5494, 29.4830, 28.3125, 33.3508, 35.4065, 40.8595, 38.0167, 48.5467, 38.2178, 35.7263, 44.3164, 33.7658, 38.6447, 44.4616, 37.8688, 33.2331, 39.3477, 34.2471, 38.9567, 37.2145, 39.3465, 34.6710, 26.6301, 30.8516, 30.8723, 29.7076, 42.7976, 44.4128, 28.4938, 32.8682, 35.1048, 40.9571, 28.7226, 32.7272, 30.2132, 29.0897, 34.0419, 28.1496, 35.2347, 37.2364, 29.1620, 40.4460, 35.7201, 29.2285, 42.9233, 34.9017, 26.6193, 32.8131, 30.6090, 28.5616, 30.4919, 39.5930, 32.4821, 35.1442, 33.9530, 29.4919, 28.4642, 26.1369, 35.1675, 37.4473, 37.6916, 24.6438, 30.0035, 26.0824, 31.5738, 33.6830, 30.7659, 28.5338, 29.9502, 39.0303, 40.7136, 47.5382, 27.8366, 34.5073, 33.2366, 38.1850, 31.3074, 32.5488, 30.3595, 36.6709, 37.7908, 33.5284, 33.9880, 30.6810, 33.2926, 34.0935, 37.5900, 35.6707, 35.0608, 30.6652, 33.3158, 32.9712, 43.8148, 37.1492, 34.0851, 37.8289, 40.5501, 41.6061};\)\), "\n", \(\(other = {59.7390, 43.6978, 45.4508, 55.6974, 43.6578, 56.6556, 49.9503, 59.2501, 59.2778, 30.8085, 50.9860, 59.0104, 57.1471, 48.4696, 47.4461, 37.0215, 52.0141, 53.8839, 35.8155, 43.8406, 58.4949, 56.7300, 45.6032, 37.7449, 45.7960, 44.0860, 55.0693, 30.5343, 57.3854, 30.2706, 42.1799, 44.9115, 40.9325, 57.4171, 30.2881, 46.1233, 36.4139, 46.7199, 36.3994, 42.1318, 54.0219, 34.4747, 49.2455, 43.0440, 58.2225, 57.4492, 53.2136, 30.9330, 48.8363, 32.6732, 53.6791, 31.7362, 35.7099, 39.6966, 48.3676, 42.1876, 53.2427, 47.2459, 31.4635, 43.3965, 44.8305, 46.8731, 39.5781, 37.0135, 37.8624, 37.1184, 36.5494, 41.2847, 48.9843, 50.6743, 48.5370, 33.1112, 54.3590, 56.7471, 34.9079, 43.4007, 34.6556, 42.0995, 55.4403, 42.9701, 56.2188, 39.4222, 33.5046, 38.6973, 59.6348, 38.3582, 46.5821, 36.8290, 35.3832, 56.9938, 33.0390, 37.8167, 51.4268, 37.3356, 30.3771, 37.7568, 43.9597, 50.1189, 47.6849, 53.0739, 34.6360, 36.0122, 31.8986, 49.6269};\)\), "\n", \(salum = Sort[aluminum]\), "\n", \(soth = Sort[other]\)}], "Input", CellLabel->"In[114]:="], Cell["\<\ Plot the ordered data and see what they tell you about the pattern of the two \ sets of data.\ \>", "Text"], Cell[BoxData[{ \(\(ListPlot[salum, PlotJoined -> True];\)\), "\n", \(\(ListPlot[soth, PlotJoined -> True];\)\)}], "Input", CellLabel->"In[118]:="], Cell["\<\ You should look for fit functions for this data. We will do the \"other\" \ first, since the pattern is linear.\ \>", "Text"], Cell[BoxData[{ \(sothfit = Fit[soth, {x, 1}, x]\), "\n", \(\(sothfitplot = Plot[sothfit, {x, 0, 102}, DisplayFunction -> Identity, PlotStyle -> RGBColor[1, 0, 0]];\)\), "\n", \(\(sothplot = ListPlot[soth, DisplayFunction -> Identity];\)\), "\n", \(\(Show[sothplot, sothfitplot, DisplayFunction -> $DisplayFunction];\)\)}], "Input", CellLabel->"In[120]:="], Cell["\<\ The good fit could imply that the \"other\" pollutant distribution is \ somewhat uniform.\ \>", "Text"], Cell["\<\ The pattern for the aluminum pollutants is clearly not linear. Let's \ transform the data by doing a log transformation and finding the natural log \ of the aluminum pollutants and then doing a linear fit.\ \>", "Text"], Cell[BoxData[{ \(\(logsalum = Table[{i, Log[salum[\([i]\)]]}, {i, 1, Length[salum]}];\)\), "\n", \(logsalumfit = Fit[logsalum, {x, 1}, x]\), "\n", \(\(logsalumfitplot = Plot[logsalumfit, {x, 0, 105}, DisplayFunction -> Identity, PlotStyle -> RGBColor[1, 0, 0]];\)\), "\n", \(\(logsalumplot = ListPlot[logsalum, DisplayFunction -> Identity];\)\), "\n", \(\(Show[{logsalumplot, logsalumfitplot}, DisplayFunction -> $DisplayFunction];\)\)}], "Input", CellLabel->"In[125]:="], Cell["\<\ You will notice that this is fit is not very good and that the data seems to \ follow a pattern of going below or above the line. A different analysis would \ be in order. We will do that with SPSS later.\ \>", "Text"] }, Closed]] }, Closed]], Cell[CellGroupData[{ Cell["\<\ Part II - Looking for Cause and Effect: Temperature as a Function of Latitude\ \ \>", "Section"], Cell["\<\ Following is data showing 56 cities in the U. S. together with their average \ January temperatures (in degrees Fahrenheit) over a thirty-year period and \ their latitudes.\ \>", "Text"], Cell[BoxData[{ \(Off[General::spell]\ \), "\n", \(Off[General::spell1]\ \), "\n", \(Off[Set::write]\), "\n", \(\(city = {MobileAL, MontgomeryAL, PhoenixAZ, LittleRockAR, Los\ AngelesCA, San\ FranciscoCA, DenverCO, NewHavenCT, WilmingtonDE, WashingtonDC, JacksonvilleFL, KeyWestFL, MiamiFL, AtlantaGA, BoiseID, ChicagoIL, IndianapolisIN, DesMoinesIA, WichitaKS, LouisvilleKY, NewOrleansLA, PortlandME, BaltimoreMD, BostonMA, DetroitMI, MinneapolisMN, St . LouisMO, HelenaMT, OmahaNE, ConcordNH, AtlanticCityNJ, AlbuquerqueNM, AlbanyNY, NewYorkNY, CharlotteNC, RaleighNC, BismarckND, CincinnatiOH, ClevelandOH, OklahomaCityOK, PortlandOR, HarrisburgPA, PhiladelphiaPA, CharlestonSC, NashvilleTN, AmarilloTX, GalvestonTX, HoustonTX, SaltLakeCityUT, BurlingtonVT, NorfolkVA, SeattleWA, SpokaneWA, MadisonWI, MilwaukeeWI, CheyenneWY};\)\n\), "\n", \(\(januarytemp = {44, 38, 35, 31, 47, 42, 15, 22, 26, 30, 45, 65, 58, 37, 22, 19, 21, 11, 22, 27, 45, 12, 25, 23, 21, 2, 24, 8, 13, 11, 27, 24, 14, 27, 34, 31, 0, 26, 21, 28, 33, 24, 24, 38, 31, 24, 49, 44, 18, 7, 32, 33, 19, 9, 13, 14};\)\n\), "\n", \(\(latitude = {31.2, 32.9, 33.6, 35.4, 34.3, 38.4, 40.7, 41.7, 40.5, 39.7, 31, 25, 26.3, 33.9, 43.7, 42.3, 39.8, 41.8, 38.1, 39, 30.8, 44.2, 39.7, 42.7, 43.1, 45.9, 39.3, 47.1, 41.9, 43.5, 39.8, 35.1, 42.6, 40.8, 35.9, 36.4, 47.1, 39.2, 42.3, 35.9, 45.6, 40.9, 40.9, 33.3, 36.7, 35.6, 29.4, 30.1, 41.1, 45, 37, 48.1, 48.1, 43.4, 43.3, 41.2};\)\n\), "\[IndentingNewLine]", \(Clear[citydata]\), "\n", \(\(citydata = Table[{city[\([i]\)], januarytemp[\([i]\)], latitude[\([i]\)]}, {i, 1, Length[city]}];\)\), "\[IndentingNewLine]", \(\(PrependTo[ citydata, {City, January\ Temperature, Latitude}];\)\), "\[IndentingNewLine]", \(citydata // TableForm\)}], "Input", CellLabel->"In[99]:="], Cell["\<\ First do a scatterplot to see if there is a relationship between the January \ temperatures and latitude.\ \>", "Text"], Cell[BoxData[ \(\(scatter = ListPlot[ Table[{latitude[\([i]\)], januarytemp[\([i]\)]}, {i, 1, Length[latitude]}], AxesLabel -> {"\", January\ Temperature}, PlotStyle -> PointSize[ .015]];\)\)], "Input", CellLabel->"In[8]:="], Cell["\<\ Since the scatterplot appears to have somewhat of a linear pattern, with the \ average January temperature dropping as the latitude increases, we will \ investigate this relationship further. We will use Mathematica's built-in Fit \ function to get results. Note that you choose a linear fit by specifying the \ {x,1}. We can write the line of best fit as follows.\ \>", "Text"], Cell[CellGroupData[{ Cell["Least squares approach", "Subsection"], Cell[BoxData[ \(yfit = Fit[Table[{latitude[\([i]\)], januarytemp[\([i]\)]}, {i, 1, Length[latitude]}], {x, 1}, x]\)], "Input", CellLabel->"In[9]:="], Cell["Now we can plot this line with our scatterplot.", "Text"], Cell[BoxData[{ \(\(linearfit = Plot[yfit, {x, 20, 50}, DisplayFunction \[Rule] Identity];\)\), "\n", \(\(Show[scatter, linearfit, DisplayFunction \[Rule] $DisplayFunction];\)\)}], "Input", CellLabel->"In[11]:="], Cell[TextData[StyleBox[" Can you identify cities that deviate from the \ pattern? Might you speculate on the reason for the deviation? Name one \ important factor that might explain at least a part of this deviation.", FontColor->RGBColor[1, 0, 0]]], "Text"], Cell[CellGroupData[{ Cell["Checking out the Correlation Coefficient", "Subsubsection"], Cell["\<\ You need to first read in the following package, then just ask for the \ correlation.\ \>", "Text"], Cell[BoxData[ \(<< Statistics`MultiDescriptiveStatistics`\)], "Input", CellLabel->"In[27]:="], Cell[BoxData[{ \(Print["\", corr = Correlation[januarytemp, latitude]]\), "\[IndentingNewLine]", \(Print["\", coeffdet = corr\^2]\)}], "Input", CellLabel->"In[112]:="] }, Closed]] }, Closed]], Cell[CellGroupData[{ Cell["Dot Product Approach", "Subsection"], Cell["\<\ If the data sets for the latitude and the januarytemp are scaled around their \ means, the dot product can be used to determine how closely aligned the two \ sets of data are. The following commands will effectively accomplish the \ necessary scaling and define the dot product of two vectors and the magnitude \ of a vector in terms of a dot product. Note that the dot product is arrived \ at with the use of a \"period\" between two vectors, rather than the times \ sign.\ \>", "Text"], Cell[BoxData[{ \(mean[x_] := Sum[x[\([i]\)], {i, 1, Length[x]}]/Length[x]\), "\n", \(scaled[x_] := Table[x[\([i]\)] - mean[x], {i, 1, Length[x]}]\), "\n", \(dot[x_, y_] := scaled[x] . scaled[y]\), "\n", \(magnitude[x_] := Sqrt[dot[x, x]]\)}], "Input", CellLabel->"In[13]:="], Cell["\<\ Recall that the cosine of the angle between two vectors can be found by \ taking the dot product and then dividing by the magnitude of each vector. \ When the cosine is fairly near +1 or -1, the vectors are closely aligned. In \ statistics, we call the cosine of this angle the correlation coefficient (r), \ and its square measures the percent of change in the dependent variable (y) \ that can be attributed to that variables linear relationship with the \ independent variable (x). Furthermore, the slope of the line that best describes the linear \ relationship between x and y is this cosine value times the ratio of the \ magnitude of y over the magnitude of x.\ \>", "Text"], Cell[BoxData[{ \(slope[x_, y_] := dot[x, y]/dot[x, x]\), "\n", \(correlation[x_, y_] := dot[x, y]/\((magnitude[x] magnitude[y])\)\)}], "Input", CellLabel->"In[17]:="], Cell["\<\ Now, thinking of latitude as the independent variable (x) and januarytemp as \ the dependent variable (y), we will find the slope and correlation \ coefficient.\ \>", "Text"], Cell[BoxData[{ \(Print[\n\t"\", slope[latitude, januarytemp]]\), "\n", \(Print["\", r = correlation[latitude, januarytemp]]\), "\n", \(Print[\*"\"\<\!\(r\^2\) = \>\"", correlation[latitude, januarytemp]\^2]\)}], "Input", CellLabel->"In[33]:="], Cell["\<\ Using this slope and rescaling the two variables, we can write the line of \ best fit as follows.\ \>", "Text"], Cell[BoxData[{ \(Clear[x]\), "\n", \(y = mean[januarytemp] + slope[latitude, januarytemp] \((x - mean[latitude])\) // Simplify\)}], "Input", CellLabel->"In[24]:="], Cell["How do your two results compare? ", "Text"], Cell[BoxData[{ \(y \[Equal] yfit\), "\[IndentingNewLine]", \(r \[Equal] corr\)}], "Input", CellLabel->"In[36]:="] }, Closed]] }, Closed]], Cell[CellGroupData[{ Cell["\<\ Part III - Going Back to the Calculus to Determine the Curve of Best Fit - \ Exponential Regression Coefficients for a Medical Study \ \>", "Section", Evaluatable->False, CellHorizontalScrolling->False, TextAlignment->Left], Cell[BoxData[ FormBox[ RowBox[{ \(Data\ was\ collected\ concerning\ a\ specific\ genetic\ characteristic \ that\ researches\ believed\ \ followed\ a\ particular\ \ exponential\ pattern . \ After\ observing\ a\ scatterplot\ of\ the\ data\), ",", " ", RowBox[{ RowBox[{ "they", " ", "looked", " ", "for", " ", "a", " ", "fit", " ", "function", " ", "of", " ", "the", " ", "form", " ", StyleBox["y", FontWeight->"Bold"]}], StyleBox[" ", FontWeight->"Bold"], StyleBox["=", FontWeight->"Bold"], StyleBox[" ", FontWeight->"Bold"], RowBox[{ StyleBox[\(a\ e\^bx\), FontWeight->"Bold"], StyleBox[" ", FontWeight->"Bold"], StyleBox["+", FontWeight->"Bold"], StyleBox[" ", FontWeight->"Bold"], RowBox[{ RowBox[{ StyleBox["c", FontWeight->"Bold"], ".", " ", "\n", "Because"}], " ", "this", " ", "is", " ", "not", " ", "a", " ", "standard", " ", "fit", " ", "function"}]}]}], ",", " ", \(it\ is\ necessary\ to\ use\ the\ minimization\ techniques\ from\ multivariable\ calculus\ from\ scratch . \nIn\ all\), ",", " ", \(there\ were\ 2723\ sets\ of\ data\ collected\ and\ those\ data\ sets \ were\ placed\ into\ eight\ categories . \ \nIn\ the\ set\ defined \ below\), ",", " ", \(the\ first\ entry\ represents\ the\ frequency\), ",", " ", \(the\ second\), ",", " ", \(the\ categorical\ variable\ \((x)\)\ \ and\ the\ third\ the\ observed\ genetic\ measurement\ \(\((y)\) . \)\)}], TextForm]], "Text"], Cell[BoxData[{ \(Off[General::spell]\ \), "\n", \(Off[General::spell1]\ \), "\n", \(Clear[x, y, a, b, c]\), "\n", \(\(dataset = {{579, 1, 38.08}, {1021, 2, 29.70}, {607, 3, 25.42}, {324, 4, 23.15}, {120, 5, 21.79}, {46, 6, 20.91}, {17, 7, 19.37}, {9, 8, 19.36}};\)\), "\n", \(Print["\"]\), "\n", \(Print[TableForm[dataset]]\), "\n", \(y[x_] := a\ Exp[b\ x] + c\)}], "Input"], Cell["\<\ We begin our process by writing the formula for the sum of the squares of the \ vertical distances between the curve of best fit and the observed data. This \ will be a function of a, b, and c, so we will find the three first partials \ and solve for the values of a, b, and c when we set those partial equal to \ zero.\ \>", "Text"], Cell[BoxData[{ \(Clear[a, b, c]\), \(f[a_, b_, c_] := Sum[dataset[\([i, 1]\)]* \((dataset[\([i, 3]\)] - \ a\ Exp[dataset[\([i, 2]\)]*b] - c) \)\^2, {i, 1, Length[dataset]}]\), \(fa[a_, b_, c_] := \[PartialD]\_a f[a, b, c]\), \(fb[a_, b_, c_] := \[PartialD]\_b f[a, b, c]\), \(fc[a_, b_, c_] := \[PartialD]\_c f[a, b, c]\)}], "Input"], Cell[TextData[{ StyleBox["NOTE", FontWeight->"Bold"], ": This evaluation will take more time than most evaluations you do with ", StyleBox["Mathematica. ", FontSlant->"Italic"], "On the other hand, you probably don't want to tackle this problem by hand! \ You should keep in mind that when you execute the Solve or NSolve, ", StyleBox["Mathematica", FontSlant->"Italic"], " will attempt to find all solutions, real or complex. ", StyleBox[ "If you don't want to take the time for this evaluation, skip the next \ command and move onto the oursolution line.", FontWeight->"Bold"] }], "Text"], Cell[BoxData[ \(solns = NSolve[\n\t\t\t{fa[a, b, c] == 0, fb[a, b, c] == 0, fc[a, b, c] == 0}, \n\t\t\t{a, b, c}]\)], "Input"], Cell["\<\ If you already executed this command but don't have time to wait for the \ solution, you can abort the evaluation process (selection under Kernel) and I \ will give you the one you need. If you do wait for the solution, the one you \ want can be accessed by typing the command, oursolution=soln[[2]], assuming \ the second solution set is the real-valued one. \ \>", "Text"], Cell[BoxData[ \(oursolution = {c \[Rule] 20.2912540016757825`, a \[Rule] 33.2221056564958949`, b \[Rule] \(-0.626855004870265108`\)} \)], "Input"], Cell["Now evaluate y[x] for those values of a, b, and c", "Text"], Cell[BoxData[ \(yhat[x_] = y[x] /. oursolution\)], "Input"], Cell["\<\ Look at the results graphically and compute the sum of the squares of the \ errors.\ \>", "Text"], Cell[BoxData[ \(p1 = Plot[yhat[x], {x, 0, 12}, PlotRange \[Rule] {15, 55}, \n\t\t DisplayFunction -> Identity]; \n pts = Table[{dataset[\([i, 2]\)], dataset[\([i, 3]\)]}, \n \t\t{i, 1, Length[dataset]}]; \n p2 = ListPlot[pts, PlotStyle \[Rule] PointSize[ .01], \n\t DisplayFunction -> Identity]; \n Show[p1, p2, DisplayFunction -> $DisplayFunction]; \)], "Input"], Cell[BoxData[ \(yest = Table[yhat[dataset[\([i, 2]\)]], {i, 1, Length[dataset]}]; \n yact = Table[dataset[\([i, 3]\)], {i, 1, Length[dataset]}]; \n wts = Table[dataset[\([i, 1]\)], {i, 1, Length[dataset]}]; \n sse = wts . \((yact - yest)\)\)], "Input"], Cell[TextData[StyleBox[ "This last entry shows the sum of the differences between actual and \ estimated values.\nRemember that the plotting of the points on the graph does \ not take into account\nthe weights, whereas, this difference does and you \ found the \"a.\" \"b,\" and \"c\" that minimize the sum of the squares of the \ errors.", FontFamily->"Times New Roman", FontWeight->"Plain", FontSlant->"Plain"]], "Text", Evaluatable->False], Cell[TextData[StyleBox["Give an example from a field associated with your \ intended major where it would be appropriate to collect data and find an \ approximating function for prediction purposes.", FontColor->RGBColor[1, 0, 0]]], "Text"] }, Closed]], Cell[CellGroupData[{ Cell["\<\ Part IV - A Special Set of Time Series Data: US Population Data\ \>", "Section", Evaluatable->False], Cell["\<\ The following table gives the population (in millions) of the United States \ every ten years, from 1780 through 2000. Now, let us look at a plot of this data.\ \>", "Text", Evaluatable->False, FontFamily->"Times New Roman", FontSize->12, FontWeight->"Plain", FontSlant->"Plain", FontTracking->"Plain", FontColor->GrayLevel[0], Background->GrayLevel[1], FontVariations->{"Underline"->False, "Outline"->False, "Shadow"->False}], Cell[BoxData[{ \(poplist = {{1780, 2.78}, {1790, 3.93}, {1800, 5.31}, {1810, 7.24}, {1820, 9.64}, {1830, 12.87}, {1840, 17.07}, {1850, 23.19}, {1860, 31.44}, {1870, 39.82}, {1880, 50.19}, {1890, 62.95}, {1900, 75.99}, {1910, 91.97}, {1920, 105.71}, {1930, 122.77}, {1940, 131.67}, {1950, 150.7}, {1960, 179.32}, {1970, 203.3}, {1980, 226.55}, {1990, 248.71}, {2000, 281.42}}\), "\[IndentingNewLine]", \(pplot = ListPlot[poplist, PlotStyle \[Rule] PointSize[ .01]]\)}], "Input", CellLabel->"In[146]:=", PageWidth->Infinity, FontFamily->"Courier New", FontSize->12, FontWeight->"Bold", FontColor->GrayLevel[0], Background->GrayLevel[1]], Cell["\<\ We will try fitting this data first with a polynomial function and then with \ an exponential function.\ \>", "Text", Evaluatable->False, FontFamily->"Times New Roman", FontSize->12, FontWeight->"Plain", FontSlant->"Plain", FontTracking->"Plain", FontColor->GrayLevel[0], Background->GrayLevel[1], FontVariations->{"Underline"->False, "Outline"->False, "Shadow"->False}], Cell[BoxData[{ \(polyfit = Fit[poplist, {1, x, x\^2, x\^3}, x]\), "\n", \(polyplot = Plot[polyfit, {x, 1770, 2020}, PlotStyle \[Rule] {RGBColor[0, 0, 1]}]\), "\[IndentingNewLine]", \(expfit = Fit[poplist, {1, Exp[x\/100]}, x]\), "\n", \(expplot = Plot[expfit, {x, 1770, 2020}, PlotStyle \[Rule] {RGBColor[0, 1, 0]}]\), "\[IndentingNewLine]", \(Show[pplot, polyplot, expplot]\)}], "Input", CellLabel->"In[148]:=", PageWidth->Infinity, FontFamily->"Courier New", FontSize->12, FontWeight->"Bold", FontColor->GrayLevel[0], Background->GrayLevel[1]], Cell["\<\ See what each fit would predict for the population in the years 2010. \ \>", "Text", Evaluatable->False, FontFamily->"Times New Roman", FontSize->12, FontWeight->"Plain", FontSlant->"Plain", FontTracking->"Plain", FontColor->GrayLevel[0], Background->GrayLevel[1], FontVariations->{"Underline"->False, "Outline"->False, "Shadow"->False}], Cell[BoxData[{ \(polyfit /. x \[Rule] 2010\), "\n", \(N[expfit /. x \[Rule] 2010]\)}], "Input", CellLabel->"In[153]:=", PageWidth->Infinity, FontFamily->"Courier New", FontSize->12, FontWeight->"Bold", FontColor->GrayLevel[0], Background->GrayLevel[1]], Cell[TextData[{ StyleBox["Which, if either, of these models would you consider to be a \ \"good\" fit? Compare these values to current predictions for the US \ population. You can check this out at ", FontColor->RGBColor[1, 0, 0]], ButtonBox["http://www.census.gov", ButtonData:>{ URL[ "http://www.census.gov"], None}, ButtonStyle->"Hyperlink"] }], "Text", Evaluatable->False], Cell[TextData[StyleBox["If you were to modify this model for future \ predictions, how would you modify it?", FontColor->RGBColor[1, 0, 0]]], "Text"], Cell["\<\ In the preceding population problem, compute and analyze the population \ growth every ten years.\ \>", "Text", Evaluatable->False, FontFamily->"Times New Roman", FontSize->12, FontWeight->"Plain", FontSlant->"Plain", FontTracking->"Plain", FontColor->GrayLevel[0], Background->GrayLevel[1], FontVariations->{"Underline"->False, "Outline"->False, "Shadow"->False}], Cell[BoxData[{ \(growth = Table[poplist\[LeftDoubleBracket]i + 1, 2\[RightDoubleBracket] - poplist\[LeftDoubleBracket]i, 2\[RightDoubleBracket], {i, 1, 22}]\), "\n", \(groplot = ListPlot[growth, PlotStyle \[Rule] PointSize[ .02]]\), "\n", \(linearfit = Fit[growth, {1, x}, x]\), "\n", \(linplot = Plot[linearfit, {x, 0, 25}]\), "\n", \(Show[groplot, linplot]\)}], "Input", CellLabel->"In[155]:=", PageWidth->Infinity, FontFamily->"Courier New", FontSize->12, FontWeight->"Bold", FontColor->GrayLevel[0], Background->GrayLevel[1]], Cell[BoxData[{ \(growthperc = Table[growth[\([i]\)]/poplist[\([i, 2]\)], {i, 1, 22}]\), "\n", \(plotgrowthperc - ListPlot[growthperc, PlotStyle \[Rule] PointSize[ .02]]\)}], "Input", CellLabel->"In[160]:="], Cell[TextData[StyleBox["Comment on the deviations from the straight line. \n\ What is the approximate rate of increase of the growth at every ten year \ interval?", FontColor->RGBColor[1, 0, 0]]], "Text", Evaluatable->False, FontFamily->"Times New Roman", FontSize->12, FontWeight->"Plain", FontSlant->"Plain", FontTracking->"Plain", FontColor->GrayLevel[0], Background->GrayLevel[1], FontVariations->{"Underline"->False, "Outline"->False, "Shadow"->False}] }, Closed]], Cell[CellGroupData[{ Cell["You Try It", "Section", FontColor->RGBColor[1, 0, 0]], Cell["\<\ Here are a couple data sets you can explore. You can also enter your own data \ sets and then re-execute the computations to explore relationships. \ \>", "Text"], Cell[CellGroupData[{ Cell["Age and EEG", "Subsection"], Cell["\<\ The electroencephalogram (EEG) is a device used to measure brain waves. \ Neurologists have found that the peak EEG frequency in children increases \ with age. The data below represents the results of a study done on children \ age 2 through 16 and the EEG readings (in hertz) represent the average peak \ EEG frequencies for each age group. It would be reasonable to think of age as \ the independent variable and eeg as the dependent variable.\ \>", "Text"], Cell[BoxData[{ \(age = {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16}; \n eeg = {5.33, 5.75, 5.80, 5.60, 6.00, 5.78, 5.90, 6.23, 7.28, 7.06, 7.60, 7.45, 8.23, 8.50, 9.38}; \n eegdata = Table[{age[\([i]\)], eeg[\([i]\)]}, {i, 1, Length[age]}]; \n eegdata // TableForm\), \(\(scatter = ListPlot[eegdata, AxesOrigin -> {0, 5}, AxesLabel -> {"\", "\"}, PlotStyle -> PointSize[ .015]]; \)\)}], "Input"] }, Closed]], Cell[CellGroupData[{ Cell["Reliability of Construction Material", "Subsection"], Cell["\<\ The Canadian Geotechnical Journal (Aug.,1985) reported on a study that was \ conducted to investigate the reliability of the use of fragmented Queenston \ Shale, a compaction shale, as a rockfill construction material. In \ particular, the researchers wanted to estimate the stress-strain relationship \ of the fragmented material. Their study yielded the following results, where \ axial strain (x) is given as percents and deviatoric stress (y) is given in \ kPa. You might want to consider a quadratic fit for this data.\ \>", "Text"], Cell[BoxData[ \(strain = {1.0, 2.8, 4.2, 6.0, 7.5, 9.0, 10.5, 13.5, 16.7, 19.8, 23.0, 26.0, 27.5}; \n stress = {500, 2000, 2750, 3500, 4375, 4875, 5250, 6000, 6625, 7000, 7125, 7000, 7125}; \n ss = Table[{strain[\([i]\)], stress[\([i]\)]}, {i, 1, Length[strain]}]; \n plotss = ListPlot[ss, AxesOrigin -> {0, 500}, AxesLabel -> {"\", "\"}, PlotStyle -> PointSize[ .015]]; \)], "Input"] }, Closed]] }, Closed]] }, Open ]] }, FrontEndVersion->"4.1 for Microsoft Windows", ScreenRectangle->{{0, 1024}, {0, 695}}, CellGrouping->Manual, WindowSize->{758, 607}, WindowMargins->{{128, Automatic}, {-4, Automatic}}, PrintingCopies->1, PrintingPageRange->{Automatic, Automatic}, StyleDefinitions -> "ArticleModern.nb" ] (******************************************************************* Cached data follows. If you edit this Notebook file directly, not using Mathematica, you must remove the line containing CacheID at the top of the file. The cache data will then be recreated when you save this file from within Mathematica. *******************************************************************) (*CellTagsOutline CellTagsIndex->{} *) (*CellTagsIndex CellTagsIndex->{} *) (*NotebookFileOutline Notebook[{ Cell[CellGroupData[{ Cell[1727, 52, 99, 2, 139, "Title"], Cell[1829, 56, 362, 9, 46, "Text"], Cell[CellGroupData[{ Cell[2216, 69, 31, 0, 63, "Section"], Cell[2250, 71, 482, 7, 84, "Text"] }, Closed]], Cell[CellGroupData[{ Cell[2769, 83, 45, 0, 37, "Section"], Cell[CellGroupData[{ Cell[2839, 87, 99, 2, 54, "Subsection"], Cell[2941, 91, 964, 25, 149, "Text"], Cell[3908, 118, 1066, 16, 325, "Input"], Cell[4977, 136, 159, 3, 27, "Text"], Cell[5139, 141, 721, 13, 245, "Input"], Cell[5863, 156, 388, 8, 65, "Text"], Cell[6254, 166, 566, 11, 185, "Input"], Cell[6823, 179, 278, 5, 46, "Text"] }, Closed]], Cell[CellGroupData[{ Cell[7138, 189, 66, 0, 32, "Subsection"], Cell[7207, 191, 491, 8, 84, "Text"], Cell[7701, 201, 2383, 35, 545, "Input"], Cell[10087, 238, 117, 3, 27, "Text"], Cell[10207, 243, 157, 3, 65, "Input"], Cell[10367, 248, 135, 3, 27, "Text"], Cell[10505, 253, 405, 8, 125, "Input"], Cell[10913, 263, 113, 3, 27, "Text"], Cell[11029, 268, 230, 4, 46, "Text"], Cell[11262, 274, 544, 11, 145, "Input"], Cell[11809, 287, 228, 4, 46, "Text"] }, Closed]] }, Closed]], Cell[CellGroupData[{ Cell[12086, 297, 106, 3, 37, "Section"], Cell[12195, 302, 196, 4, 46, "Text"], Cell[12394, 308, 2080, 34, 565, "Input"], Cell[14477, 344, 129, 3, 27, "Text"], Cell[14609, 349, 297, 7, 65, "Input"], Cell[14909, 358, 388, 6, 65, "Text"], Cell[CellGroupData[{ Cell[15322, 368, 44, 0, 54, "Subsection"], Cell[15369, 370, 174, 4, 45, "Input"], Cell[15546, 376, 63, 0, 27, "Text"], Cell[15612, 378, 251, 6, 65, "Input"], Cell[15866, 386, 260, 3, 46, "Text"], Cell[CellGroupData[{ Cell[16151, 393, 65, 0, 43, "Subsubsection"], Cell[16219, 395, 109, 3, 27, "Text"], Cell[16331, 400, 99, 2, 45, "Input"], Cell[16433, 404, 257, 5, 66, "Input"] }, Closed]] }, Closed]], Cell[CellGroupData[{ Cell[16739, 415, 42, 0, 32, "Subsection"], Cell[16784, 417, 497, 8, 84, "Text"], Cell[17284, 427, 295, 5, 105, "Input"], Cell[17582, 434, 691, 11, 122, "Text"], Cell[18276, 447, 184, 4, 65, "Input"], Cell[18463, 453, 185, 4, 46, "Text"], Cell[18651, 459, 284, 5, 106, "Input"], Cell[18938, 466, 121, 3, 27, "Text"], Cell[19062, 471, 201, 6, 65, "Input"], Cell[19266, 479, 49, 0, 27, "Text"], Cell[19318, 481, 124, 3, 65, "Input"] }, Closed]] }, Closed]], Cell[CellGroupData[{ Cell[19491, 490, 240, 6, 60, "Section", Evaluatable->False], Cell[19734, 498, 1873, 43, 115, "Text"], Cell[21610, 543, 463, 9, 185, "Input"], Cell[22076, 554, 343, 6, 65, "Text"], Cell[22422, 562, 388, 8, 147, "Input"], Cell[22813, 572, 627, 16, 84, "Text"], Cell[23443, 590, 146, 3, 85, "Input"], Cell[23592, 595, 384, 6, 65, "Text"], Cell[23979, 603, 167, 3, 45, "Input"], Cell[24149, 608, 65, 0, 27, "Text"], Cell[24217, 610, 63, 1, 45, "Input"], Cell[24283, 613, 107, 3, 27, "Text"], Cell[24393, 618, 410, 8, 165, "Input"], Cell[24806, 628, 268, 4, 105, "Input"], Cell[25077, 634, 450, 9, 65, "Text", Evaluatable->False], Cell[25530, 645, 242, 3, 46, "Text"] }, Closed]], Cell[CellGroupData[{ Cell[25809, 653, 112, 3, 37, "Section", Evaluatable->False], Cell[25924, 658, 458, 15, 62, "Text", Evaluatable->False], Cell[26385, 675, 730, 16, 65, "Input"], Cell[27118, 693, 402, 14, 43, "Text", Evaluatable->False], Cell[27523, 709, 615, 16, 138, "Input"], Cell[28141, 727, 369, 13, 43, "Text", Evaluatable->False], Cell[28513, 742, 275, 9, 65, "Input"], Cell[28791, 753, 400, 10, 46, "Text", Evaluatable->False], Cell[29194, 765, 151, 2, 27, "Text"], Cell[29348, 769, 397, 15, 62, "Text", Evaluatable->False], Cell[29748, 786, 596, 15, 125, "Input"], Cell[30347, 803, 231, 5, 65, "Input"], Cell[30581, 810, 481, 14, 62, "Text", Evaluatable->False] }, Closed]], Cell[CellGroupData[{ Cell[31099, 829, 61, 1, 37, "Section"], Cell[31163, 832, 172, 3, 46, "Text"], Cell[CellGroupData[{ Cell[31360, 839, 33, 0, 54, "Subsection"], Cell[31396, 841, 469, 7, 84, "Text"], Cell[31868, 850, 466, 9, 165, "Input"] }, Closed]], Cell[CellGroupData[{ Cell[32371, 864, 58, 0, 32, "Subsection"], Cell[32432, 866, 548, 8, 84, "Text"], Cell[32983, 876, 454, 8, 125, "Input"] }, Closed]] }, Closed]] }, Open ]] } ] *) (******************************************************************* End of Mathematica Notebook file. *******************************************************************)