|
Preface |
6 |
|
|
Contents |
14 |
|
|
Part I (Semi-) Plenary Presentations |
22 |
|
|
Classification and Data Mining in Musicology |
24 |
|
|
1 Introduction |
24 |
|
|
2 Music,1/f-noise, fractal and chaos |
24 |
|
|
3 Music and entropy |
25 |
|
|
4 Score information and performance |
27 |
|
|
Acknowledgements |
28 |
|
|
References |
31 |
|
|
Bayesian Mixed Membership Models for Soft Clustering and Classi.cation |
32 |
|
|
1 Introduction |
32 |
|
|
2 Mixed membership models |
35 |
|
|
3 Disability types among older adults |
37 |
|
|
3.1 National Long Term Care Survey |
37 |
|
|
3.2 Applying the mixed membership model |
38 |
|
|
4 Classifying publications by topic |
39 |
|
|
4.1 Proceedings of the National Academy of Sciences |
39 |
|
|
4.2 Applying the mixed membership model |
40 |
|
|
4.3 An alternative approach with related data |
43 |
|
|
4.4 Choosing K to describe PNAS topics |
43 |
|
|
5 Summary and concluding remarks |
44 |
|
|
Acknowledgments |
45 |
|
|
References |
45 |
|
|
Predicting Protein Secondary Structure with Markov Models |
48 |
|
|
1 Introduction |
48 |
|
|
2 Themethod |
49 |
|
|
3 Improvements |
50 |
|
|
4 Ongoing research |
53 |
|
|
5 Summary |
54 |
|
|
References |
54 |
|
|
Milestones in the History of Data Visualization: A Case Study in Statistical Historiography |
55 |
|
|
1 Introduction |
55 |
|
|
1.1 The Milestones Project |
56 |
|
|
2 Milestones tour |
57 |
|
|
2.1 1600-1699: Measurement and theory |
57 |
|
|
2.2 1700-1799: New graphic forms |
58 |
|
|
2.3 1800-1850: Beginnings of modern graphics |
58 |
|
|
2.4 1850-1900: The Golden Age of statistical graphics |
60 |
|
|
2.5 1900-1950: The modern dark ages |
60 |
|
|
2.6 1950-1975: Re-birth of data visualization |
61 |
|
|
3 Problems and methods in statistical historiography |
62 |
|
|
3.1 What counts as a Milestone? |
62 |
|
|
3.2 Who gets credit? |
63 |
|
|
3.3 Dating milestones |
63 |
|
|
3.4 What is milestones “data” |
64 |
|
|
3.5 Analyzing milestones “data” |
64 |
|
|
3.6 What was he thinking?: Understanding through reproduction |
64 |
|
|
3.7 What kinds of tools are needed? |
65 |
|
|
4 How to visualize a history? |
66 |
|
|
4.1 Lessons from the past |
67 |
|
|
4.2 Lessons from the present |
68 |
|
|
4.3 Lessons from the web |
69 |
|
|
4.4 Lessons from the data visualization |
70 |
|
|
Acknowledgments |
70 |
|
|
References |
71 |
|
|
Quantitative Text Typology: The Impact of Word Length |
74 |
|
|
1 Introduction: Structuring the universe of texts |
74 |
|
|
1.1 Classification and quantification |
74 |
|
|
1.2 Quantitative text analysis: From a de.nition of the basics towards data homogeneity |
75 |
|
|
1.3 Word length in a synergetic context |
76 |
|
|
1.4 Qualitative and quantitative classi.cations: A priori and a posteriori |
77 |
|
|
2 A case study: Classifying 398 Slovenian texts |
78 |
|
|
2.1 Post hoc analysis of mean word length |
80 |
|
|
2.2 Discriminant analyses: The whole corpus |
80 |
|
|
2.3 From four to two letter types |
81 |
|
|
2.4 Towards a new typology |
82 |
|
|
2.5 Conclusion |
85 |
|
|
References |
85 |
|
|
Cluster Ensembles |
86 |
|
|
1 Introduction |
86 |
|
|
2 Consensus partitions |
88 |
|
|
3 Extensions |
91 |
|
|
References |
92 |
|
|
Bootstrap Confidence Intervals for Three-way Component Methods |
94 |
|
|
1 Introduction |
94 |
|
|
2 The bootstrap for fully determined solutions |
95 |
|
|
3 Smaller bootstrap intervals using transformations |
98 |
|
|
4 Performance of bootstrap confidence intervals |
99 |
|
|
5 An application: Bootstrap confidence intervals for results from a Tucker3 Analysis |
100 |
|
|
6 Discussion |
102 |
|
|
References |
104 |
|
|
Organising the Knowledge Space for Software Components |
106 |
|
|
1 Introduction |
106 |
|
|
2 The software development process |
107 |
|
|
3 A knowledge space for software development |
109 |
|
|
4 Organising the knowledge space |
110 |
|
|
4.1 Ontologies |
110 |
|
|
4.2 A discovery and composition ontology |
111 |
|
|
4.3 Description of components |
112 |
|
|
4.4 Discovery and composition of components |
113 |
|
|
5 Conclusions |
116 |
|
|
References |
116 |
|
|
Multimedia Pattern Recognition in Soccer Video Using Time Intervals |
118 |
|
|
1 Introduction |
118 |
|
|
2 Multimedia event classification framework |
119 |
|
|
2.1 Pattern representation |
120 |
|
|
2.2 Pattern classification |
122 |
|
|
3 Highlight event classification in soccer broadcasts |
124 |
|
|
4 Evaluation |
126 |
|
|
4.1 Evaluation criteria |
126 |
|
|
4.2 Classification results |
127 |
|
|
5 Conclusion |
128 |
|
|
References |
129 |
|
|
Quantitative Assessment of the Responsibility for the Disease Load in a Population |
130 |
|
|
1 Introduction |
130 |
|
|
2 Basic definitions of attributable risk |
131 |
|
|
3 Crude and adjusted attributable risk |
132 |
|
|
4 Sequential attributable risk |
133 |
|
|
5 Partial attributable risk |
134 |
|
|
6 Illustrative example: The G.R.I.P.S. Study |
135 |
|
|
7 Conclusion |
136 |
|
|
Acknowledgment |
137 |
|
|
References |
137 |
|
|
Part II Classification and Data Analysis |
140 |
|
|
Bootstrapping Latent Class Models |
142 |
|
|
1 Introduction |
142 |
|
|
2 Bootstrap analysis |
143 |
|
|
3 Bootstrap analysis in .nite mixture models |
144 |
|
|
4 An application to the latent class model |
145 |
|
|
5 Conclusion |
149 |
|
|
References |
149 |
|
|
Dimensionality of Random Subspaces |
150 |
|
|
1 Introduction |
150 |
|
|
2 Model aggregation |
151 |
|
|
3 Random Subspace Method |
152 |
|
|
4 Feature selection for ensembles |
153 |
|
|
5 Proposed method |
154 |
|
|
6 Related work |
154 |
|
|
7 Experiments |
155 |
|
|
8 Summary |
156 |
|
|
References |
156 |
|
|
Two-stage Classification with Automatic Feature Selection for an Industrial Application |
158 |
|
|
1 Introduction |
158 |
|
|
2 Two-stage classification |
159 |
|
|
2.1 Motivation |
159 |
|
|
2.2 First stage – object classification |
160 |
|
|
2.3 Second stage – image sequence classification |
160 |
|
|
2.4 Polynomial classifier |
160 |
|
|
3 System optimization |
161 |
|
|
3.1 Wrapper approach |
161 |
|
|
3.2 Search strategies in feature subsets |
162 |
|
|
3.3 Efficiency |
162 |
|
|
4 Experimental results |
163 |
|
|
5 Conclusion and outlook |
165 |
|
|
References |
165 |
|
|
Bagging, Boosting and Ordinal Classification |
166 |
|
|
1 Introduction |
166 |
|
|
2 Aggregating classi.ers |
166 |
|
|
3 Ordinal prediction |
168 |
|
|
4 Empirical studies |
171 |
|
|
5 Concluding remarks |
172 |
|
|
References |
173 |
|
|
A Method for Visual Cluster Validation |
174 |
|
|
1 Introduction |
174 |
|
|
2 Optimal projection for separation |
176 |
|
|
3 Optimal projection for heterogeneity |
177 |
|
|
4 Example |
178 |
|
|
5 Conclusion |
181 |
|
|
References |
181 |
|
|
Empirical Comparison of Boosting Algorithms |
182 |
|
|
1 Introduction |
182 |
|
|
2 Arcing algorithms |
183 |
|
|
2.1 Adaboost |
183 |
|
|
2.2 Arcing family |
185 |
|
|
3 Empirical study |
185 |
|
|
3.1 Base classi.er and performance measure |
186 |
|
|
3.2 Results |
186 |
|
|
4 Conclusion |
187 |
|
|
References |
188 |
|
|
Iterative Majorization Approach to the Distance-based Discriminant Analysis |
189 |
|
|
1 Introduction |
189 |
|
|
2 Problem formulation |
190 |
|
|
3 Iterative majorization |
191 |
|
|
4 Dimensionality reduction and multiple-class setting |
193 |
|
|
5 Experimental results |
194 |
|
|
References |
196 |
|
|
An Extension of the CHAID Tree-based Segmentation Algorithm to Multiple Dependent Variables |
197 |
|
|
1 Background and summary of approach |
197 |
|
|
2 The CHAID algorithm |
198 |
|
|
3 Latent class modeling |
200 |
|
|
4 The hybrid CHAID algorithm |
201 |
|
|
5 Empirical example |
202 |
|
|
6 Final comments |
203 |
|
|
References |
204 |
|
|
Expectation of Random Sets and the ‘Mean Values’ of Interval Data |
205 |
|
|
1 Introduction |
205 |
|
|
2 Reduction to characteristic points |
206 |
|
|
3.1 The Aumann expectation |
207 |
|
|
3.2 The Frechet expectation |
207 |
|
|
3.3 The Doss expectation |
208 |
|
|
3.4 The Vorob’ev expectation |
208 |
|
|
4 Expectations of Random Closed Rectangles |
209 |
|
|
4.1 The Aumann expectation |
209 |
|
|
4.2 The Frechet expectation |
211 |
|
|
4.3 The Doss expectation |
211 |
|
|
4.4 The Vorob’ev expectation |
211 |
|
|
5 Discussion |
212 |
|
|
References |
212 |
|
|
Experimental Design for Variable Selection in Data Bases |
213 |
|
|
1 Introduction |
213 |
|
|
2 Data |
214 |
|
|
3 Plackett-Burman designs |
214 |
|
|
4 Results |
216 |
|
|
4.1 Stepwise regression by forward selection |
216 |
|
|
4.2 Classification methods |
216 |
|
|
4.3 Variable assessment |
217 |
|
|
5 Conclusion |
220 |
|
|
References |
220 |
|
|
KMC/EDAM: A New Approach for the Visualization of K-Means Clustering Results |
221 |
|
|
1 Introduction |
221 |
|
|
2 Methods |
222 |
|
|
2.1 Preliminaries |
222 |
|
|
2.2 Basic idea |
223 |
|
|
2.3 KMC/EDAM |
223 |
|
|
3 Examples |
225 |
|
|
4 Conclusion |
228 |
|
|
References |
228 |
|
|
Clustering of Variables with Missing Data: Application to Preference Studies |
229 |
|
|
1 Introduction |
229 |
|
|
2 Clustering of variables around latent components |
230 |
|
|
3 Imputation methods |
230 |
|
|
3.1 Direct imputation methods |
230 |
|
|
3.2 Imputation within each cluster |
230 |
|
|
3.3 Method based on a cross-partition |
231 |
|
|
4 Illustration: data set ’jam’ |
232 |
|
|
5 Simulation study |
233 |
|
|
5.1 Jam data set |
233 |
|
|
5.2 Simulated data |
233 |
|
|
5.3 Criterion for comparison |
233 |
|
|
5.4 Results |
234 |
|
|
6 Conclusion |
236 |
|
|
Acknowledgment |
236 |
|
|
References |
236 |
|
|
Binary On-line Classification Based on Temporally Integrated Information |
237 |
|
|
1 General framework |
237 |
|
|
1.1 Data format |
238 |
|
|
1.2 On-line classification |
238 |
|
|
2 Integration of information across time |
239 |
|
|
3 Application |
240 |
|
|
3.1 Neurophysiology |
241 |
|
|
3.2 Model |
241 |
|
|
3.3 Results |
243 |
|
|
References |
244 |
|
|
Different Subspace Classification |
245 |
|
|
1 Introduction |
245 |
|
|
2 Notationandmethod |
246 |
|
|
2.1 Characteristic regions |
246 |
|
|
2.2 Classification rule |
247 |
|
|
3 Visualization |
248 |
|
|
4 Parameter choice for DiSCo |
249 |
|
|
4.1 Building the regions |
249 |
|
|
4.2 Optimizing the thresholds |
250 |
|
|
5 Simulation study |
250 |
|
|
5.1 Data generation |
250 |
|
|
5.2 Results |
251 |
|
|
6 Summary |
252 |
|
|
References |
252 |
|
|
Density Estimation and Visualization for Data Containing Clusters of Unknown Structure |
253 |
|
|
1 Introduction |
253 |
|
|
2 Information optimal sets, Pareto Radius, PDE |
254 |
|
|
3 PDE in one dimension: PDEplot |
256 |
|
|
4 Measuring and visualization of density of high dimensional data |
257 |
|
|
5 Summary |
259 |
|
|
References |
260 |
|
|
Hierarchical Mixture Models for Nested Data Structures |
261 |
|
|
1 Introduction |
261 |
|
|
2 Model formulation |
262 |
|
|
2.1 Standard finite mixture model |
262 |
|
|
2.2 Hierarchical finite mixture model |
262 |
|
|
3 Maximum likelihood estimation by an adapted EM algorithm |
264 |
|
|
4 An empirical example |
265 |
|
|
5 Variants and extensions |
267 |
|
|
References |
268 |
|
|
Iterative Proportional Scaling Based on a Robust Start Estimator |
269 |
|
|
1 Introduction |
269 |
|
|
2 Covariance selection models |
270 |
|
|
3 Iterative proportional scaling (IPS) |
271 |
|
|
4 IPS robustified |
272 |
|
|
5 Model selection with RIPS |
273 |
|
|
6 Open questions |
276 |
|
|
References |
276 |
|
|
Exploring Multivariate Data Structures with Local Principal Curves |
277 |
|
|
1 Introduction |
277 |
|
|
2 Local principal curves |
278 |
|
|
3 Simulated data examples |
281 |
|
|
5 Conclusion |
283 |
|
|
References |
283 |
|
|
A Three-way Multidimensional Scaling Approach to the Analysis of Judgments About Persons |
285 |
|
|
1 Introduction |
285 |
|
|
2 The structure of judgments about persons |
285 |
|
|
3 ‘SUMM–ID’ model |
286 |
|
|
4 Application |
290 |
|
|
5 Concluding remarks |
291 |
|
|
References |
292 |
|
|
Discovering Temporal Knowledge in Multivariate Time Series |
293 |
|
|
1 Introduction |
293 |
|
|
2 Data |
294 |
|
|
3 Unification-based Temporal Grammar |
294 |
|
|
4 Time Series Knowledge Mining |
296 |
|
|
5 Discussion |
298 |
|
|
6 Summary |
299 |
|
|
Acknowledgements |
299 |
|
|
References |
300 |
|
|
A New Framework for Multidimensional Data Analysis |
301 |
|
|
1 Information in data |
301 |
|
|
2 Illustrative example |
302 |
|
|
3 Geometric model for categorical data |
304 |
|
|
4 Squared item-component correlation |
304 |
|
|
5 Correlation between multidimensional variables |
305 |
|
|
6 Decomposition of information in data and total information |
306 |
|
|
7 Conclusion |
307 |
|
|
References |
308 |
|
|
External Analysis of Two-mode Three-way Asymmetric Multidimensional Scaling |
309 |
|
|
1 Introduction |
309 |
|
|
2 Themethod |
310 |
|
|
3 An application |
311 |
|
|
4 Discussion |
314 |
|
|
References |
316 |
|
|
The Relevance Vector Machine Under Covariate Measurement Error |
317 |
|
|
1 Introduction |
317 |
|
|
2 Nonparametric regression using the RVM |
318 |
|
|
2.1 The RVM model setup |
318 |
|
|
2.2 Inference |
319 |
|
|
3 Covariate measurement error and its correction |
320 |
|
|
3.1 The classical error model |
320 |
|
|
3.2 Error correction using regression calibration |
320 |
|
|
3.3 Error correction using SIMEX |
321 |
|
|
3.4 Simulation results for the SIMEX |
322 |
|
|
4 Discussion |
323 |
|
|
Acknowledgements |
324 |
|
|
References |
324 |
|
|
Part III Applications |
326 |
|
|
A Contribution to the History of Seriation in Archaeology |
328 |
|
|
1 Introduction |
328 |
|
|
2 The early years |
328 |
|
|
3 Mathematicalmodels |
329 |
|
|
4 The method of Brainerd and Robinson |
330 |
|
|
5 Permutation search |
331 |
|
|
6 Towards correspondence analysis |
332 |
|
|
References |
335 |
|
|
Model-based Cluster Analysis of Roman Bricks and Tiles from Worms and Rheinzabern |
338 |
|
|
1 Introduction and task |
338 |
|
|
2 Model-based Gaussian clustering |
340 |
|
|
3 Results and archaeological discussion |
342 |
|
|
4 Conclusion |
345 |
|
|
References |
345 |
|
|
Astronomical Object Classification and Parameter Estimation with the Gaia Galactic Survey Satellite |
346 |
|
|
1 The Gaia Galactic survey mission |
346 |
|
|
2 Astrophysical data |
346 |
|
|
3 Classification challenges |
347 |
|
|
4 Outlook |
348 |
|
|
References |
349 |
|
|
Design of Astronomical Filter Systems for Stellar Classification Using Evolutionary Algorithms |
351 |
|
|
1 Astrophysical context |
351 |
|
|
2 The optimization model |
352 |
|
|
2.1 Parametrization |
352 |
|
|
2.2 Figure-of-merit (fitness) |
353 |
|
|
2.3 Evolutionary algorithm |
354 |
|
|
3 Application, results and interpretation |
355 |
|
|
4 Conclusions and future work |
358 |
|
|
References |
358 |
|
|
Analyzing Microarray Data with the Generative Topographic Mapping Approach |
359 |
|
|
1 Introduction |
359 |
|
|
2 Data structure |
360 |
|
|
3 The GTM approach |
361 |
|
|
4 Application to a data set |
363 |
|
|
5 Summary and outlook |
365 |
|
|
References |
366 |
|
|
Test for a Change Point in Bernoulli Trials with Dependence |
367 |
|
|
1 Introduction |
367 |
|
|
2 Test problem |
368 |
|
|
3 Intercalary independence of Markov processes |
370 |
|
|
4 Strategies for performing a test |
371 |
|
|
5 Example |
372 |
|
|
References |
373 |
|
|
Data Mining in Protein Binding Cavities |
375 |
|
|
1 Introduction |
375 |
|
|
2 Other approaches |
376 |
|
|
3 Theory and algorithm |
377 |
|
|
4 First results |
379 |
|
|
5 Conclusions |
380 |
|
|
References |
381 |
|
|
Classification of In Vivo Magnetic Resonance Spectra |
383 |
|
|
1 Introduction |
383 |
|
|
2 Data |
384 |
|
|
2.1 General features |
384 |
|
|
2.2 Details |
384 |
|
|
3 Methods |
385 |
|
|
3.1 Evaluated algorithms |
386 |
|
|
3.2 Benchmark settings |
387 |
|
|
4 Results |
388 |
|
|
5 Conclusions |
390 |
|
|
References |
390 |
|
|
Modifying Microarray Analysis Methods for Categorical Data – SAM and PAM for SNPs |
391 |
|
|
1 Introduction |
391 |
|
|
2 Multiple testing and the false discovery rate |
392 |
|
|
3 Significance analysis of microarrays |
393 |
|
|
4 SAM applied to single nucleotide polymorphisms |
394 |
|
|
5 Prediction analysis of microarrays |
395 |
|
|
6 Prediction analysis of SNPs |
396 |
|
|
7 Discussion |
397 |
|
|
References |
398 |
|
|
Improving the Identification of Differentially Expressed Genes in cDNA Microarray Experiments |
399 |
|
|
1 Introduction |
399 |
|
|
2 Data sets, LogRatio, RelDi. |
400 |
|
|
3 Comparison of LogRatio and RelDi. |
401 |
|
|
4 Stabilization of variance |
405 |
|
|
5 Summary |
406 |
|
|
References |
406 |
|
|
PhyNav: A Novel Approach to Reconstruct Large Phylogenies |
407 |
|
|
1 Introduction |
407 |
|
|
2 Minimal k-distance subsets |
408 |
|
|
3 The PhyNav algorithm |
409 |
|
|
4 The efficiency of PhyNav |
409 |
|
|
4.1 Simulated datasets |
410 |
|
|
4.2 Biological datasets |
411 |
|
|
5 Discussion and conclusion |
412 |
|
|
Acknowledgments |
413 |
|
|
References |
413 |
|
|
NewsRec, a Personal Recommendation System for News Websites |
415 |
|
|
1 Introduction |
415 |
|
|
2 Requirements, system design, and implementation details |
417 |
|
|
3 Website classi.cation and evaluation measures |
418 |
|
|
4 Empirical results |
419 |
|
|
5 Conclusions and outlook |
420 |
|
|
References |
422 |
|
|
Clustering of Large Document Sets with Restricted Random Walks on Usage Histories |
423 |
|
|
1 Motivation |
423 |
|
|
2 Clustering with purchase histories |
424 |
|
|
3 Time complexity |
428 |
|
|
4 Results |
428 |
|
|
5 Outlook |
430 |
|
|
References |
430 |
|
|
Fuzzy Two-mode Clustering vs. Collaborative Filtering |
431 |
|
|
1 Introduction |
431 |
|
|
2 Two-mode data analysis |
432 |
|
|
2.1 Memory-based Collaborative Filtering (CF) |
432 |
|
|
2.2 (Fuzzy) Two-Mode Clustering (FTMC) |
433 |
|
|
3 The Delta-Method for fuzzy two-mode clustering |
434 |
|
|
4 Examples and comparisons |
435 |
|
|
5 Conclusions |
437 |
|
|
References |
437 |
|
|
Web Mining and Online Visibility |
439 |
|
|
1 Introduction – “Why measurement of online visibility?” |
439 |
|
|
2 (Human) Online search in a changing webgraph |
439 |
|
|
2.1 The web as a graph |
440 |
|
|
2.2 (Human) Online searching and sur.ng behavior |
441 |
|
|
3 Measurement of Online Visibility |
441 |
|
|
3.1 Main drivers of Online Visibility |
442 |
|
|
3.2 Web data used for our sample |
442 |
|
|
3.3 The measure GOVis |
443 |
|
|
3.4 Results |
444 |
|
|
4 Conclusion and managerial implications |
445 |
|
|
References |
446 |
|
|
Analysis of Recommender System Usage by Multidimensional Scaling |
447 |
|
|
1 Introduction |
447 |
|
|
2 Methodology |
448 |
|
|
3 Empirical results |
449 |
|
|
3.1 The data sets |
449 |
|
|
3.2 Representation of products and search profiles |
450 |
|
|
3.3 Analysis of system usage |
451 |
|
|
4 Summary |
453 |
|
|
References |
454 |
|
|
On a Combination of Convex Risk Minimization Methods |
455 |
|
|
1 Introduction |
455 |
|
|
2 Strategy |
455 |
|
|
3 Kernel logistic regression and e.support vector regression |
458 |
|
|
4 Application |
460 |
|
|
5 Discussion |
462 |
|
|
Acknowledgments |
462 |
|
|
References |
462 |
|
|
Credit Scoring Using Global and Local Statistical Models |
463 |
|
|
1 Introduction |
463 |
|
|
2 Description of the data set |
464 |
|
|
3 Global scoring model |
464 |
|
|
3.1 Global scoring using logistic discriminant analysis |
464 |
|
|
3.2 Classification rule under constraints |
465 |
|
|
4 Local scoring by two-stage classification |
466 |
|
|
4.1 Clustering using self-organizing maps |
467 |
|
|
4.2 K-means cluster analysis |
468 |
|
|
4.3 Evaluation of two-stage classi.cation |
468 |
|
|
5 Application to the test sample |
469 |
|
|
6 Conclusions |
470 |
|
|
References |
470 |
|
|
Informative Patterns for Credit Scoring: Support Vector Machines Preselect Data Subsets for Linear Discriminant Analysis |
471 |
|
|
1 Introduction |
471 |
|
|
2 LinearSVMandLDA |
472 |
|
|
3 Subset preselection for LDA: Empirical results |
475 |
|
|
3.1 About typical and critical subsets |
475 |
|
|
3.2 LDA with subset preselection |
476 |
|
|
3.3 Comparing SVM, LDA and LDA-SP |
476 |
|
|
3.4 Advantages of LDA with subset preselection |
477 |
|
|
4 Conclusions |
477 |
|
|
References |
478 |
|
|
Application of Support Vector Machines in a Life Assurance Environment |
479 |
|
|
1 Introduction |
479 |
|
|
2 Support vector machines |
480 |
|
|
3 Problem context and the data |
481 |
|
|
4 A measure of variable importance |
482 |
|
|
5 Results |
484 |
|
|
References |
486 |
|
|
Continuous Market Risk Budgeting in Financial Institutions |
487 |
|
|
1 Introduction |
487 |
|
|
2 Analysis framework |
488 |
|
|
3 Time dimension of risk limits |
489 |
|
|
4 Continuous risk budgeting |
490 |
|
|
5 Simulation analysis |
492 |
|
|
Acknowledgement |
493 |
|
|
References |
494 |
|
|
Smooth Correlation Estimation with Application to Portfolio Credit Risk |
495 |
|
|
1 Introduction |
495 |
|
|
2 The sector variable |
496 |
|
|
3 Testing for independence |
497 |
|
|
4 Model generation |
498 |
|
|
5 A one-factor model |
499 |
|
|
6 Algebraic approximation |
500 |
|
|
7 Impact on the practical performance |
501 |
|
|
References |
501 |
|
|
A Appendix |
502 |
|
|
How Many Lexical-semantic Relations are Necessary? |
503 |
|
|
1 Introduction |
503 |
|
|
2 Concept calculus |
504 |
|
|
3 Diagrammatic representation |
506 |
|
|
4 Concept and linguistic sign |
509 |
|
|
5 Summary |
510 |
|
|
References |
510 |
|
|
Automated Detection of Morphemes Using Distributional Measurements |
511 |
|
|
1 Overview and introduction |
511 |
|
|
2 Why bother with the segmentation of words at all? |
512 |
|
|
3 The historical background of research: Distributional analysis |
512 |
|
|
4 Basicmethod |
513 |
|
|
5 Re.nements of the evaluation |
515 |
|
|
6 Transferring graphemic to phonemic representation |
516 |
|
|
7 Concluding remarks |
517 |
|
|
References |
518 |
|
|
Classification of Author and/or Genre? The Impact of Word Length |
519 |
|
|
1 Word length and the quantitative description of text(s) and author(s) |
519 |
|
|
2 A case study: text basis and analytical options |
520 |
|
|
3 Methods of text discrimination |
521 |
|
|
3.1 Quantitative measures for text analysis |
522 |
|
|
3.2 Discriminant analysis |
523 |
|
|
3.3 Statistical distance as a measure for data discrimination |
523 |
|
|
4 Summary |
526 |
|
|
References |
526 |
|
|
Some Historical Remarks on Library Classification – a Short Introduction to the Science of Library Classification |
527 |
|
|
1 Introduction |
527 |
|
|
2 Classified arrangement in monastery libraries of the Middle Ages |
528 |
|
|
3 Classified arrangement in private libraries of the Middle Ages |
528 |
|
|
4 Classified arrangement in the late Middle Ages and at the beginning of modern times |
529 |
|
|
6 Systematic cataloguing in the 18th century |
530 |
|
|
7 Subject cataloguing in the 19th century |
530 |
|
|
8 Subject cataloguing in the 20th century |
531 |
|
|
References |
532 |
|
|
Automatic Validation of Hierarchical Cluster Analysis with Application in Dialectometry |
534 |
|
|
1 Introduction |
534 |
|
|
2 Pair-wise data clustering |
535 |
|
|
3 Resampling techniques based on weights of observations |
536 |
|
|
4 Rand’s measure for comparing partitions |
536 |
|
|
5 A simulation study |
538 |
|
|
6 Application in quantitative linguistics |
539 |
|
|
7 Conclusions |
540 |
|
|
References |
541 |
|
|
Discovering the Senses of an Ambiguous Word by Clustering its Local Contexts |
542 |
|
|
1 Introduction |
542 |
|
|
2 Approach |
543 |
|
|
3 Algorithm |
544 |
|
|
4 Results |
546 |
|
|
5 Conclusions and prospects |
548 |
|
|
Acknowledgements |
549 |
|
|
References |
549 |
|
|
Document Management and the Development of Information Spaces |
550 |
|
|
1 Starting point and task |
550 |
|
|
2 Implementation |
550 |
|
|
3 Representation of the information space |
551 |
|
|
4 Processing .ow text |
551 |
|
|
5 Processing partially structured documents |
554 |
|
|
6 Summary and outlook |
556 |
|
|
References |
557 |
|
|
Stochastic Ranking and the Volatility “Croissant”: A Sensitivity Analysis of Economic Rankings |
558 |
|
|
1 Introduction |
558 |
|
|
2 Index definition and ranking |
559 |
|
|
3 Data |
561 |
|
|
4 Sensitivity analysis by randomised weights |
562 |
|
|
5 Ranking results |
563 |
|
|
6 Conclusions |
565 |
|
|
References |
565 |
|
|
Importance Assessment of Correlated Predictors in Business Cycles Classi.cation |
566 |
|
|
1 Problem |
566 |
|
|
1.1 Introduction |
566 |
|
|
1.2 Measures of importance |
567 |
|
|
2 Correlated predictors in regression models |
567 |
|
|
2.1 Overview |
567 |
|
|
2.2 Orthogonalization |
568 |
|
|
3 Correlated predictors in classi.cation models |
569 |
|
|
3.1 Orthogonalization |
569 |
|
|
3.2 Using a large number of variables |
569 |
|
|
3.3 Results for the business cycle model |
570 |
|
|
4 Discussion and outlook |
571 |
|
|
References |
573 |
|
|
Economic Freedom in the 25-Member European Union: Insights Using Classi.cation Tools |
574 |
|
|
1 Introduction |
574 |
|
|
2 Data description and distance measures |
575 |
|
|
2.1 Description of the economic freedom index data |
575 |
|
|
2.2 Distance measures |
576 |
|
|
3 Cluster analysis methods and cluster patterns |
578 |
|
|
3.1 Cluster analysis methods |
578 |
|
|
3.2 Empirical cluster patterns |
579 |
|
|
4 Conclusion and outlook |
581 |
|
|
References |
581 |
|
|
Intercultural Consumer Classifications in E-Commerce |
582 |
|
|
1 Introduction |
582 |
|
|
2 The concept of construction consumer typologies |
582 |
|
|
3 Characteristics for constructing typologies relevant for E-Commerce |
583 |
|
|
3.1 Requirements regarding criteria used for constructing typologies |
583 |
|
|
3.2 Selected constructs for a classi.cation |
583 |
|
|
4 Empirical survey of the typology theory |
584 |
|
|
4.1 Survey design and data collection |
584 |
|
|
4.2 A typology of online customers |
585 |
|
|
5 Conclusion |
588 |
|
|
References |
588 |
|
|
Reservation Price Estimation by Adaptive Conjoint Analysis |
590 |
|
|
1 Introduction |
590 |
|
|
2 Conjoint analysis for reservation price estimation |
591 |
|
|
3 Reservation price estimation based on economic theory |
592 |
|
|
4 Application of the method |
595 |
|
|
5 Conclusion and further research |
596 |
|
|
References |
597 |
|
|
Estimating Reservation Prices for Product Bundles Based on Paired Comparison Data |
598 |
|
|
1 Introduction |
598 |
|
|
2 Gathering data for conjoint measurement |
599 |
|
|
2.1 Direct vs. indirect elicitation of reservation prices |
599 |
|
|
2.2 Relative direct elicitation of reservation prices |
600 |
|
|
3 Study design and application situation |
601 |
|
|
4 Results |
602 |
|
|
5 Discussion |
604 |
|
|
References |
604 |
|
|
Classification of Perceived Musical Intervals |
606 |
|
|
1 Background |
606 |
|
|
2 Experimental setting |
608 |
|
|
3 Results |
610 |
|
|
4 Conclusion |
612 |
|
|
References |
613 |
|
|
In Search of Variables Distinguishing Low and High Achievers in Music Sight Reading Task |
614 |
|
|
1 Background |
614 |
|
|
2 Method |
615 |
|
|
3 Results |
617 |
|
|
4 Discussion |
619 |
|
|
References |
620 |
|
|
Automatic Feature Extraction from Large Time Series |
621 |
|
|
1 Introduction |
621 |
|
|
2 Systematization of statistical methods |
622 |
|
|
2.1 Windowing extends the method space |
622 |
|
|
2.2 Method trees for feature extraction |
623 |
|
|
2.3 Dynamic windowing in method trees |
624 |
|
|
3 Automatic feature extraction |
625 |
|
|
4 Experiments |
626 |
|
|
4.1 Results |
627 |
|
|
5 Conclusion |
627 |
|
|
References |
628 |
|
|
Identification of Musical Instruments by Means of the Hough-Transformation |
629 |
|
|
1 The Hough-transform |
629 |
|
|
2 Application to sound data |
630 |
|
|
2.1 Digital sounds |
630 |
|
|
2.2 Motivation: signal edges |
630 |
|
|
2.3 Parametrization |
631 |
|
|
2.4 Resulting data format |
631 |
|
|
3 Classification |
632 |
|
|
3.1 Approaches |
632 |
|
|
3.2 Data set |
633 |
|
|
3.3 Methods |
633 |
|
|
3.4 Variable selection |
634 |
|
|
3.5 Results |
634 |
|
|
3.6 Comparing the results |
635 |
|
|
4 Conclusions |
636 |
|
|
References |
636 |
|
|
Support Vector Machines for Bass and Snare Drum Recognition |
637 |
|
|
1 Introduction |
637 |
|
|
2 Previous work |
638 |
|
|
3 Data gathering |
639 |
|
|
4 Descriptors for audio |
640 |
|
|
5 Support Vector Machines |
641 |
|
|
6 Experiments and results |
642 |
|
|
7 Conclusions and future work |
643 |
|
|
Acknowledgements |
644 |
|
|
References |
644 |
|
|
Register Classification by Timbre |
645 |
|
|
1 Introduction |
645 |
|
|
2 Data |
646 |
|
|
3 Classification methods |
647 |
|
|
4 Results |
648 |
|
|
4.1 Individual tones, voices only |
648 |
|
|
4.2 Individual tones, voices and instruments |
649 |
|
|
4.3 Averaged tones, voices only |
649 |
|
|
4.4 Averaged tones, voices and instruments |
649 |
|
|
5 Acoustics |
650 |
|
|
6 Conclusion |
651 |
|
|
References |
652 |
|
|
Classification of Processes by the Lyapunov Exponent |
653 |
|
|
1 Introduction |
653 |
|
|
2 Lyapunov exponent |
654 |
|
|
3 Well-predictable and not-well-predictable processes |
656 |
|
|
4 Experimental results |
658 |
|
|
5 Conclusion |
659 |
|
|
References |
659 |
|
|
Desirability to Characterize Process Capability |
661 |
|
|
1 Introduction |
661 |
|
|
2 Combining capability and desirability - the indices EDU and EDM |
663 |
|
|
3 Discussion |
665 |
|
|
4 Estimation |
666 |
|
|
5 Simulation |
667 |
|
|
6 Conclusion |
668 |
|
|
References |
668 |
|
|
Application and Use of Multivariate Control Charts in a BTA Deep Hole Drilling Process |
669 |
|
|
1 Introduction |
669 |
|
|
2 Monitoring the process using multiple Residual Shewhart control charts |
670 |
|
|
3 Monitoring the process using multivariate control charts |
671 |
|
|
3.1 Data depth |
671 |
|
|
3.2 A control chart based on sequential rank of data depth measures |
672 |
|
|
4 Application |
673 |
|
|
4.1 Choice of the control charts parameters |
673 |
|
|
4.2 Results |
674 |
|
|
4.3 Discussion |
675 |
|
|
5 Conclusion |
675 |
|
|
Acknowledgements |
676 |
|
|
References |
676 |
|
|
Determination of Relevant Frequencies and Modeling Varying Amplitudes of Harmonic Processes |
677 |
|
|
1 Introduction |
677 |
|
|
2 Determination of the distribution of periodogram ordinates |
678 |
|
|
3 Regression models on periodogram ordinates |
679 |
|
|
3.1 Modelling varying amplitudes |
679 |
|
|
3.2 Estimating the variance of e (s2 e ) |
680 |
|
|
4 Simulation study on time-varying amplitudes |
680 |
|
|
4.1 Design considerations |
680 |
|
|
4.2 Results |
681 |
|
|
5 Conclusions |
684 |
|
|
References |
684 |
|
|
Part IV Contest: Social Milieus in Dortmund |
686 |
|
|
Introduction to the Contest “Social Milieus in Dortmund” |
688 |
|
|
1 Contest goal and data |
688 |
|
|
Application of a Genetic Algorithm to Variable Selection in Fuzzy Clustering |
695 |
|
|
1 The problem |
695 |
|
|
2 Tackling the problem |
695 |
|
|
3 Methods |
696 |
|
|
3.1 Fuzzy clustering |
696 |
|
|
3.2 Measuring the clustering quality |
697 |
|
|
3.3 Defining subgroups of variables |
697 |
|
|
3.4 Genetic optimization algorithms |
698 |
|
|
3.5 Implementation |
699 |
|
|
4 Applying the procedure |
699 |
|
|
4.1 The Dortmund data |
699 |
|
|
4.2 Results |
700 |
|
|
4.3 Comparing the results |
701 |
|
|
5 Summary |
702 |
|
|
References |
702 |
|
|
Annealed k-Means Clustering and Decision Trees |
703 |
|
|
1 Introduction |
703 |
|
|
2 Preprocessing |
704 |
|
|
3 Clustering |
704 |
|
|
3.1 Annealed k-means |
704 |
|
|
3.2 Learning about k |
705 |
|
|
3.3 Solution |
706 |
|
|
4 Classification |
706 |
|
|
5 Interpretation |
708 |
|
|
6 Outlook |
709 |
|
|
References |
710 |
|
|
Correspondence Clustering of Dortmund City Districts |
711 |
|
|
1 Introduction |
711 |
|
|
2 Material and methods |
712 |
|
|
3 Results |
715 |
|
|
4 Conclusion |
718 |
|
|
References |
718 |
|
|
Keywords |
719 |
|
|
Authors |
724 |
|