To understand how biological systems function, one must uncover the higher order principles by which biological systems self-organize, function, and evolve. Biologists need not only large datasets that describe the components and pairwise interactions of biological systems; they need to find higher order patterns within seeming disorder. The ability to detect reproducible patterns and to mathematically describe these patterns from novel organizing principles is central to truly understanding biological systems.
There are several barriers to addressing ‘grand challenges’ in biology. Biologists lack the ability to make complex queries across heterogeneous datasets that are usually incomplete and arbitrarily organized. Groups develop their own file formats, data models and application software without awareness of comparable needs elsewhere. Thus, it is almost impossible to combine data and models from multiple sources and utilize the most up-to-date tools to analyze and simulate complex interrelations. More globally, computational thinking and collaboration is diffuse because the goals of computational scientists and biologists are often not well aligned.
An effective cyberinfrastructure will encourage communication among disciplines and reuse of data models, file formats, application software and algorithms, while fostering the type of cross-disciplinary exchange of ideas that will advance both the specific case of plant science, and the broader fields of life and computer sciences. To address these challenges we propose to develop The iPlant Collaborative (iPC), a new community of plant biologists, computer scientists, mathematicians and engineers organized around a core cyberinfrastructure. The iPC will empower these groups to integrate their research and collaboratively address major questions in biology, simultaneously advancing both the biological sciences and the computer and information sciences.
An organizing principle of the iPC will be Grand Challenge teams, cross-disciplinary, community-driven research groups that work collaboratively with iPC staff to design and develop ‘Discovery Environments’, software platforms custom designed to help the team address a Grand Challenge question. Discovery Environments will typically take the form of “mashup” applications which facilitate the integration of diverse types of data and tools, but beneath their surface simplicity discovery environments will support sophisticated systems for semantic integration, description, and manipulation of biological data types. Discovery Environments will be integrated into the growing infrastructure of the iPC, becoming in time an open source resource that is expanded and maintained by the community as a whole.
The scientific community will be encouraged to participate in, and take ownership of, the iPC via a wide range of synthesis activities, including smaller Discovery Environment design groups that are not necessarily associated with Grand Challenge Teams; a web-based virtual community center; outreach teams that train users, including those at undergraduate and minority-serving institutions, to use the iPC infrastructure to its best effect; and by partnering and developing synergistic, integrated ties with other centers, such as the ecology synthesis center (NCEAS) and the evolution synthesis center (NESCent).
To guide its community relationships, the iPC will include resident social scientists who will work with the iPC to design, develop, and enable the implementation of social networking tools, resulting in strategies and mechanisms for building and nurturing collaborations. Ultimately, however, researchers cross-trained to apply computational thinking to biology are the real infrastructure. To help train a new generation of scientists who bring computational thinking to biological problems, the iPC will work with the community groups to develop innovative curricula and training programs for computational thinking in biology at the K-12 and university levels


