Kepler CORE. Phylogenetics. Astronomy. Library Science. Ecology. Conservation. Oceanography. Biology. Geosciences. Molecular. Biology.

Size: px
Start display at page:

Download "Kepler CORE. Phylogenetics. Astronomy. Library Science. Ecology. Conservation. Oceanography. Biology. Geosciences. Molecular. Biology."

Transcription

1 Ecology Astronomy Phylogenetics Library Science COMET! Oceanography Geosciences Chemistry Kepler CORE Particle Physics Conservation Biology Molecular Biology ChIP-chip

2 The Kepler-CORE Project and Team Kepler-CORE (sensu stricto) 3-year, $1.7M NSF-OCI funded project Kepler-CORE UCD,UCSB, UCSD: UC Davis Bertram Ludaescher (PI@UCD), Shawn Bowers (co-pi), Tim McPhillips (co-pi & software architect), David Welker (software engineer), Sean Riddle (software engineer) UC Santa Barbara Matthew Jones (PI@UCSB), Mark Schildhauer (co-pi), Aaron Schultz, Chad Berkley (software engineer) UC San Diego Ilkay Altintas (PI@SDSC), Jianwu Wang (postdoc) Kepler/CORE (sensu lato) Goal: sustain long-term, beyond initial funding period KEPLER = Kepler/Core + Kepler/X + Kepler/Y + Core, X, Y, = open community of stakeholders, contributors, users, etc.

3 In the future we foresee Kepler Kepler-CORE Vision Satisfying the scientific workflow automation needs of Collaborative government-funded projects Academic research groups Individual researchers in diverse scientific disciplines Enhancing the productivity of researchers by Facilitating discovery and collaboration within and across disciplines Being the best way for scientists to leverage developments and expertise in other domains Leading to further breakthroughs and innovations in the fields of Scientific data management Data provenance Collaborative scientific computing Kepler-CORE Duo Shepherded by a self-sustaining effort that thrives well beyond the lifetimes of the grants that have contributed to Kepler s development.

4 Kepler-CORE Mission In collaboration with current and future contributors to Kepler, the Kepler/CORE team will Develop and maintain the essential, interdisciplinary software components of Kepler Coordinate the contributions of the greater Kepler collaboration to the core ( kernel ) of the system Increase the role of the current and future user community in specifying requirements and priorities

5 Kepler: Open Source + Open Community 1. Huge diversity of domains => needs 2. Astrophysics, nuclear fusion research, geoinformatics, ecology, systematics, bioinformatics, genomics, environmental monitoring, simulation, Not just bioinformatics and cheminformatics A broad range of technical problems Workflow design with a graphical UI Sharing actors, workflows across communities Distributed workflow execution Data movement on the network Integrate local apps, web services, native actors Support a variety of computational models Not just web service orchestration or Grid deployment

6 Kepler: Open Source + Open Community 3. Many kinds of users with different backgrounds and responsibilities: Scientists automating and sharing their analyses of their own data or performing meta-analyses on others data Software engineers developing their own systems around Kepler Computer scientists doing basic research in scientific workflows, data and provenance management, distributed and collaborative computing. Not just biologists and chemists 4. Kepler used in many different deployment contexts Standalone application on a scientist s desktop computer or laptop. Backend for web-based scientific applications. Embedded workflow engine in larger systems. One size (of deployment) does not fit all! 5. Kepler open to contribution and extension by anyone: Anyone can contribute to Kepler! Anyone can use Kepler in their own applications Developing with Kepler doesn t require collaboration with the owners

7 These differences mean that the Kepler collaboration will be unique, too Kepler cannot possibly solve everyone s problems right out of the box Kepler must be adaptable to different domain sciences Adaptation requires more than developing new actors Kepler is as much a development platform as an end-user tool No one group can take responsibility for supporting all the ways Kepler will be used Kepler is open-source but more complex than other open source projects Diversity of domains, users, and deployment contexts mean there can be conflicts between the needs or priorities of contributors Need a way of developing and adding extensions without breaking other s systems Software engineers developing code for Kepler often are not expert scientists, and cannot be the final authority on what the system should do (unlike projects like Apache, Linux, etc where the engineers are the expert users themselves and can add what they need) PIs and project managers on projects extending Kepler must take responsibility for knowing what needs to be done. It is essential that for each project employing Kepler, representatives authoritative on the scientific and technical needs of their projects participate in driving the future development of Kepler!

8 Kepler-CORE Strategy: We have much to do! Kepler-CORE will coordinate the effort to Define a unified vision and architecture for the Kepler system with a clearly defined kernel of capabilities applicable to all projects. Identify and develop critical new core features to make Kepler a comprehensive scientific workflow system: full support for data, workflow, service, project management, Facilitate the application of Kepler to diverse scientific domains and deployment contexts by providing well-defined extension points Ensure system stability by rigorous software testing Adopt an engineering approach capable of delivering and supporting regular software releases Train future system end-users, workflow engineers, and Kepler extension developers Disseminate documentation and training materials to the broader scientific community Evaluate organizational, management and funding approaches for sustaining development and maintenance of the Kepler system.

9 Stakeholders: Essential to Success of Kepler Kepler stakeholders Are projects and individuals whose work depend critically on the success of Kepler. Are funded by a variety of sources and work in diverse fields of scientific research. Are more likely to greatly extend Kepler and use Kepler within their own systems than simply develop packages of actors and workflows for use with a standard distribution of Kepler. Need to deliver the software systems they develop to their own community of users. Must deliver their software systems according to their own (e.g. release) schedules as determined by their research and funding programs. Have different requirements that will conflict in the absence of mechanisms for enabling independent extension and deployment of Kepler-based systems. Require recognition for the contributions they make to Kepler as well as for their own systems based on Kepler. Know better than us what they need from Kepler.

10 How do your users interact with Kepler? How is Kepler used in your projects? Do your primary users employ the standard Kepler/GUI in a desktop computing environment? Do they generally develop new workflows themselves or configure and run workflows developed by you or others? Does Kepler run as a standalone application, as a web-application backend, or as an embedded workflow automation engine? Is Kepler an environment for interactively exploring data? Or automating data management tasks behind the scenes? What do workflows represent in your projects? Do your workflows model the scientific analyses your users need automated? Or do your workflows automate the flow of data between compute nodes and monitor jobs? What other systems does Kepler need to interact with in your projects? Project and data management systems? Remote data stores? Job queuing systems, computer clusters, and Grids? Domain-specific applications?

11 How do you develop with Kepler? What kind of deep modifications do you need to make to Kepler? Alternative computational models (e.g. for handling streaming data)? Workflow monitoring tools? New data modeling and management frameworks? Ways of recording data provenance? What problems do you foresee running into when making these extensions? How do you currently develop with Kepler? Do you integrate your code with Kepler in the Kepler CVS repository? Do you develop with snapshots of the Kepler repository? Do you use the alpha and beta releases of Kepler? Have you forked Kepler, making fundamental changes not reflected in the Kepler repository? How would you like to develop with Kepler? How do you release Kepler to your community? Do you provide packages of actors that work with standard releases of Kepler? Do you provide customized distributions of Kepler with your additions and extensions built in? Do you make releases of Kepler behind the scenes, e.g. as part of a web application? What would make it easier for your to make your releases?

12 Sound too hard? A proposal for how it could work Kepler/CORE will focus work on the essential core of Kepler The scope of Kepler is too big for any one project to take responsibility for. Kepler/CORE s strategy is to enable multiple projects to contribute to the effort in a coherent way. Kepler/CORE will take responsibility for the Kepler kernel. Domain specific parts of Kepler will be moved into extensions. Kepler/CORE will provide convenient extension points to Kepler Will make it easy to create and share packages of actors and workflows. Will decouple the GUI from the workflow engine and make it easy to employ alternative user interfaces Will support interaction with your data management, job distribution, and domain-specific systems. Kepler/CORE will structure the development environment (source code repository) to support independent extension Will separate kernel from extensions. Each project will have their own area in the repository and control over access to it. Build system will allow you to build your extensions against the kernel and select which other extension to include.

13 And here s what you would do Your projects will be able to focus on developing the extensions you need. Develop packages of actors for your community. Create custom GUIs and web interfaces. Develop and interface to project-specific data management systems. Your projects will be able to deploy Kepler as needed. Package customized Kepler distributions for your community. Deploy Kepler behind web-applications. Embed workflow engine in other scientific applications. Your projects will be able to share work with and leverage work done by other projects. By storing your projects in the Kepler repository your work will be easy for others to find. It will be convenient for you to include extensions developed by other projects and to establish collaborations with other groups developing with Kepler. You will be able to make fundamental changes to the kernel independently. Build system will provide kernel-override mechanism. You will be able to add new extension points to the kernel without disrupting others work. You can contribute major changes, including new extension points, to the kernel once they work well.

14 Astronomy Phylogenetics Library Science Ecology Oceanography COMET! Geosciences Kepler CORE Conservation Biology Molecular Now on to Engineering Biology & Mechanics!! Chemistry Particle Physics ChIP-chip

15 to boldly go Kepler-CORE Duo