Wiley.com
Print this page Share

Bioinformatics Challenges at the Interface of Biology and Computer Science: Mind the Gap

ISBN: 978-0-470-03548-1
Paperback
424 pages
September 2016, Wiley-Blackwell
List Price: US $83.95
Government Price: US $58.20
Enter Quantity:   Buy
Bioinformatics Challenges at the Interface of Biology and Computer Science: Mind the Gap (047003548X) cover image

Preface x

Acknowledgements xvii

About the companion website xviii

PART 1

1 Introduction 3

1.1 Overview 3

1.2 Bioinformatics 3

1.2.1 What is bioinformatics? 3

1.2.2 The provenance of bioinformatics 4

1.2.3 The seeds of bioinformatics 5

1.3 Computer Science 7

1.3.1 Origins of computer science 7

1.3.2 Computer science meets bioinformatics 9

1.4 What did we want to do with bioinformatics? 10

1.5 Summary 12

1.6 References 13

1.7 Quiz 14

1.8 Problems 16

2 The biological context 17

2.1 Overview 17

2.2 Biological data ]types and concepts 17

2.2.1 Diversity of biological data ]types 17

2.2.2 The central dogma 18

2.2.3 Fundamental building ]blocks and alphabets 19

2.2.4 The protein structure hierarchy 29

2.2.5 RNA processing in prokaryotes and eukaryotes 30

2.2.6 The genetic code 33

2.2.7 Conceptual translation and gene finding 35

2.3 Access to whole genomes 42

2.4 Summary 43

2.5 References 43

2.6 Quiz 46

2.7 Problems 47

3 Biological databases 49

3.1 Overview 49

3.2 What kinds of database are there? 49

3.3 The Protein Data Bank (PDB) 50

3.4 The EMBL nucleotide sequence data library 56

3.5 GenBank 58

3.6 The PIR ]PSD 61

3.7 Swiss ]Prot 62

3.8 PROSITE 64

3.9 TrEMBL 69

3.10 InterPro 71

3.11 UniProt 73

3.12 The European Nucleotide Archive (ENA) 77

3.13 Summary 81

3.14 References 82

3.15 Quiz 85

3.16 Problems 87

4 Biological sequence analysis 89

4.1 Overview 89

4.2 Adding meaning to raw sequence data 89

4.2.1 Annotating raw sequence data 94

4.2.2 Database and sequence formats 96

4.2.3 Making tools and databases interoperate 101

4.3 Tools for deriving sequence annotations 103

4.3.1 Methods for comparing two sequences 103

4.3.2 The PAM and BLOSUM matrices 104

4.3.3 Tools for global and local alignment 110

4.3.4 Tools for comparing multiple sequences 114

4.3.5 Alignment ]based analysis methods 115

4.4 Summary 131

4.5 References 132

4.6 Quiz 134

4.7 Problems 136

5 The gap 138

5.1 Overview 138

5.2 Bioinformatics in the 21st century 138

5.3 Problems with genes 139

5.4 Problems with names 142

5.5 Problems with sequences 143

5.6 Problems with database entries 146

5.6.1 Problems with database entry formats 147

5.7 Problems with structures 148

5.8 Problems with alignments 150

5.8.1 Different methods, different results 150

5.8.2 What properties do my sequences share? 154

5.8.3 How similar are my sequences? 157

5.8.4 How good is my alignment? 160

5.9 Problems with families 163

5.10 Problems with functions 168

5.11 Functions of domains, modules and their parent proteins 173

5.12 Defining and describing functions 176

5.13 Summary 179

5.14 References 180

5.15 Quiz 182

5.16 Problems 183

PART 2

6 Algorithms and complexity 187

6.1 Overview 187

6.2 Introduction to algorithms 187

6.2.1 Mathematical computability 189

6.3 Working with computers 191

6.3.1 Discretisation of solutions 191

6.3.2 When computers go bad 193

6.4 Evaluating algorithms 197

6.4.1 An example: a sorting algorithm 197

6.4.2 Resource scarcity: complexity of algorithms 199

6.4.3 Choices, choices 200

6.5 Data structures 201

6.5.1 Structural consequences 202

6.5.2 Marrying form and function 210

6.6 Implementing algorithms 211

6.6.1 Programming paradigm 212

6.6.2 Choice of language 214

6.6.3 Mechanical optimisation 216

6.6.4 Parallelisation 224

6.7 Summary 227

6.8 References 227

6.9 Quiz 227

6.10 Problems 229

7 Representation and meaning 230

7.1 Overview 230

7.2 Introduction 230

7.3 Identification 233

7.3.1 Namespaces 233

7.3.2 Meaningless identifiers are a good thing 233

7.3.3 Identifying things on the Web 236

7.3.4 Cool URIs don’t change 238

7.3.5 Versioning and provenance 238

7.3.6 Case studies 239

7.4 Representing data 243

7.4.1 Design for change 245

7.4.2 Contemporary data ]representation paradigms 247

7.5 Giving meaning to data 255

7.5.1 Bio ontologies in practice 260

7.5.2 First invent the universe 263

7.6 Web services 264

7.6.1 The architecture of the Web 266

7.6.2 Statelessness 267

7.7 Action at a distance 268

7.7.1 SOAP and WSDL 270

7.7.2 HTTP as an API 270

7.7.3 Linked Data 272

7.8 Summary 275

7.9 References 275

7.10 Quiz 276

7.11 Problems 277

8 Linking data and scientific literature 279

8.1 Overview 279

8.2 Introduction 279

8.3 The lost steps of curators 281

8.4 A historical perspective on scientific literature 286

8.5 The gulf between human and machine comprehension 288

8.6 Research objects 295

8.7 Data publishing 297

8.8 Separating scientific wheat from chaff – towards semantic searches 298

8.9 Semantic publication 300

8.9.1 Making articles ‘semantic’ 301

8.10 Linking articles with their cognate data 305

8.10.1 What Utopia Documents does 305

8.10.2 A case study 306

8.11 Summary 314

8.12 References 315

8.13 Quiz 318

8.14 Problems 319

Afterword 321

Glossary 327

Quiz Answers 371

Problem Answers 378

Index 394

Related Titles

More By These Authors

Bioinformatics & Computational Biology

by Bjorn H. Junker (Editor), Falk Schreiber (Editor), Yi Pan (Series Editor), Albert Y. Zomaya (Series Editor)
by Jean-Michel Claverie, Cedric Notredame
by Francisco Azuaje (Editor), Joaquin Dopazo (Editor)
Back to Top