Data Automation and Curation
Aug. 7th, 2023 06:00 pmHey guys, nice to have you back here on my page. Hope everyone has been doing great. Get excited, I am going to tell you about the project I am currently working on for the Outreachy season.
Okay so let's get right to it
My project for the season is themed: "
What is automated versioning and curation?
Automated versioning refers to the process of automatically generating and managing version numbers for software or code repositories based on predefined rules or triggers. This practice streamlines the tracking of changes and updates in a systematic manner, enabling developers to easily identify and differentiate between different iterations of the software, ensuring better collaboration, traceability, and compatibility management throughout the development lifecycle.
Data curation refers to the systematic process of collecting, organizing, validating, and maintaining digital information to ensure its quality, relevance, and accessibility over time. This involves tasks such as data collection, cleaning, transformation, metadata creation, and documentation, all aimed at preserving and enhancing the value of data for analysis, interpretation, and sharing within various domains, such as research, business, and cultural heritage, while also addressing issues of accuracy, consistency, and long-term usability.
Okay enough with the terms, let me tell you about this awesome organization I am learning a lot from.
Moja global is a collaborative project under the Linux Foundation that supports ambitious climate action by bringing together a community of experts to develop open-source software – including the groundbreaking FLINT software – which allows users to accurately and affordably estimate greenhouse gas emissions and removals from forestry, agriculture and other land uses (AFOLU).
Anyone(I mean literally anyone(including you especially)) can make a contribution to the software, the science, the documentation, and the promotion of Moja global’s tools.
Moja global's mission is to make these datasets flint ready and tracked to ensure that data analyst and scientist all over the world can always see the changes that the dataset has undergone. So basically this datasets are fetched from the original sources using python pipeline scripts and processed using the gdal software.... the processed data is then tracked using dvc. this ensures that all the changes to the file are noted.... this all happens using GHA and the final tracked files are pushed to gdrive... Moja global ensures that the data is first make flint ready with an accompanying notebook or file description to tell new contributors how to go around understanding the work behind the files....
Great communication skills in writing, project documentation and data management were very valuable to participate in this project. I was working as part of a team to document the project. I am also develop expertise in data analysis and team collaboration. Skills in Python, git, dvc, R or C++ are very desirable for this project. And so you should know any code developed under this project is open source to facilitate GHG models by experts around the world.
Okay so let's get right to it
My project for the season is themed: "
Automated versioning and curation of land sector datasets
". Very exciting right? Yes. So...What is automated versioning and curation?
Automated versioning refers to the process of automatically generating and managing version numbers for software or code repositories based on predefined rules or triggers. This practice streamlines the tracking of changes and updates in a systematic manner, enabling developers to easily identify and differentiate between different iterations of the software, ensuring better collaboration, traceability, and compatibility management throughout the development lifecycle.
Data curation refers to the systematic process of collecting, organizing, validating, and maintaining digital information to ensure its quality, relevance, and accessibility over time. This involves tasks such as data collection, cleaning, transformation, metadata creation, and documentation, all aimed at preserving and enhancing the value of data for analysis, interpretation, and sharing within various domains, such as research, business, and cultural heritage, while also addressing issues of accuracy, consistency, and long-term usability.
Okay enough with the terms, let me tell you about this awesome organization I am learning a lot from.
Moja global is a collaborative project under the Linux Foundation that supports ambitious climate action by bringing together a community of experts to develop open-source software – including the groundbreaking FLINT software – which allows users to accurately and affordably estimate greenhouse gas emissions and removals from forestry, agriculture and other land uses (AFOLU).
Anyone(I mean literally anyone(including you especially)) can make a contribution to the software, the science, the documentation, and the promotion of Moja global’s tools.
Moja global's mission is to make these datasets flint ready and tracked to ensure that data analyst and scientist all over the world can always see the changes that the dataset has undergone. So basically this datasets are fetched from the original sources using python pipeline scripts and processed using the gdal software.... the processed data is then tracked using dvc. this ensures that all the changes to the file are noted.... this all happens using GHA and the final tracked files are pushed to gdrive... Moja global ensures that the data is first make flint ready with an accompanying notebook or file description to tell new contributors how to go around understanding the work behind the files....
Great communication skills in writing, project documentation and data management were very valuable to participate in this project. I was working as part of a team to document the project. I am also develop expertise in data analysis and team collaboration. Skills in Python, git, dvc, R or C++ are very desirable for this project. And so you should know any code developed under this project is open source to facilitate GHG models by experts around the world.