CI/CD – EML https://enjoymachinelearning.com All Machines Learn Thu, 22 Feb 2024 23:50:42 +0000 en-US hourly 1 https://wordpress.org/?v=6.4.5 https://enjoymachinelearning.com/wp-content/uploads/2024/03/cropped-eml-header-e1709858269741-32x32.png CI/CD – EML https://enjoymachinelearning.com 32 32 Jenkins pipeline vs. GitLab pipeline [With Example Code] https://enjoymachinelearning.com/blog/jenkins-pipeline-vs-gitlab-pipeline/ Thu, 22 Feb 2024 23:50:42 +0000 https://enjoymachinelearning.com/?p=2471 Read more

]]>
When comparing GitLab and Jenkins, it’s important to note that they are not the same – one of the fundamental differences lies in their pipeline. 

In this quick article, we’ll dive deep into the pipeline of GitLab and Jenkins and go over some of the big differences.

At the bottom, we’ve laid out an example config of both pipelines that do the same thing so you can compare how it’ll look in your project once implemented into your project.

Let’s jump in.


Fundamental Differences Between Jenkins and GitLab For Pipelines

GitLab’s pipeline is integrated with its version control system, meaning the code and pipeline are in the same place.

On the other hand, Jenkins depends on an external source code management system, such as GitHub, to properly build and run its pipelines.

This difference significantly impacts build time, testing time, and deployment time. With GitLab, there’s no API exchange between the version control system and the pipeline, which means it’s faster and more efficient.

In contrast, Jenkins needs to communicate and query wherever your code lives APIs to pull the source code and load it into the pipeline, which can slow down the process.

Jenkins is known for being robust in building artifacts and is one of the best tools for this purpose.

However, GitLab has an advantage when it comes to pipeline as code. GitLab’s pipeline code is in YAML, while Jenkins’ pipeline code is in Groovy. 

YAML is considered more readable and easier to manage, making GitLab’s pipeline more accessible and intuitive for beginners.

More information on YAML can be found here.

code on screen

GitLab provides more extensive and integrated features than Jenkins out of the box. 

This includes a built-in container registry, full integration with Kubernetes, auto DevOps capabilities, comprehensive monitoring, and usability superior to that of Jenkins. 

However, Jenkins is more flexible and customizable with a larger plug-in ecosystem. This makes Jenkins a better fit for complex, bespoke pipeline configurations.

Both tools provide CI/CD capabilities as a core feature, they can both automate the deployment, and they both can use Docker containers. But Jenkins requires plugins to use Docker containers, while GitLab Runner uses Docker by default.

Remember, GitLab tightly integrates its CI/CD with its source-code management functionality and Git-based version control, leading to project handling simplicity. 

In contrast, Jenkins does not have built-in source-code management and requires a separate application for version control. 

GitLab vs Jenkins CI/CD Pipeline Codes (Example)

As we know, GitLab and Jenkins are tools used for building, testing, and deploying applications.

GitLab has an advantage over Jenkins because it does not require an external source code management system like GitHub, since it can operate as both repository and runner.

The code and pipeline in GitLab sit in the same place, resulting in faster build, test, and deployment times.

Let’s compare side-by-side to understand the fundamental differences between GitLab and Jenkins pipeline code.

In Jenkins, we use the declarative pipeline, which breaks the pipeline into stages and steps, making it easier to manage.

On the other hand, in GitLab, we define the stages first before moving on to the steps.

Let’s take a closer look at the pipeline code for both tools:

Jenkins Pipeline Code

pipeline {
    agent {
        label 'C agent agent 1'
    }
    stages {
        stage('build') {
            steps {
                sh 'df -h'
            }
        }
        stage('test') {
            steps {
                sh 'mvn test'
            }
        }
        stage('deploy') {
            steps {
                echo 'end of pipeline'
            }
        }
    }
}

GitLab CI/CD Pipeline Code

image: $URL/image_link

stages:
  - build
  - test
  - deploy

build:
  script:
    - df -h

test:
  script:
    - mvn test

deploy:
  script:
    - echo 'end of pipeline'

 

As you can see, the GitLab pipeline code is written in YAML, while the Jenkins pipeline code is written in Groovy.

Also, in GitLab, we have to define the image of the runner that will be used to run the pipeline. 

Overall, using either Jenkins or GitLab would depend on your project or setup’s specific needs and conditions.

Due to its integrated Version Control and pipeline, GitLab has a faster build, test, and deployment time.

On the other hand, Jenkins is a robust tool for building artifacts and has been around for a long time.

If you need a more out-of-the-box solution with comprehensive features, go for GitLab. If you need flexibility and customization, Jenkins might be a better choice. 

 

Some Other CI/CD Articles

Here at enjoymachinelearning.com we have a few other in-depth articles about CI/CD.

Here are a few of those:

]]>
GitLab CI/CD PyTest Tutorial for Beginners [WITH CODE EXAMPLE] https://enjoymachinelearning.com/blog/gitlab-ci-cd-pytest-tutorial-for-beginners/ Thu, 22 Feb 2024 23:11:18 +0000 https://enjoymachinelearning.com/?p=2409 Read more

]]>
GitLab CI/CD is a powerful tool for automating the testing, building, and deployment of code changes.

In this GitLab crash course, we will guide you through the basics of setting up a CI/CD pipeline using GitLab.

By the end of the course, you will have a foundational understanding of how GitLab CI/CD works and be able to build a basic PyTest pipeline that runs tests and generates a JUnit report that can be viewed in GitLab. It will also generate an Allure report and publish it to GitLab Pages.

Demo PyTest Project

To configure GitLab CI to run automated tests using the PyTest framework, you can follow these steps:

  1. Create a test suite and ensure you can run it locally from the console. (Make sure PyTest Works Locally First!!)
  2. Create a .gitlab-ci.yml file in the root of your project.
  3. Add a pytest job to the .gitlab-ci.yml file that runs the test suite using PyTest.
  4. Add a pages job to the .gitlab-ci.yml file that publishes the test results to GitLab Pages using the Allure framework.

Here’s an example .gitlab-ci.yml file:

stages:
  - test

pytest:
  stage: test
  image: python:3.8
  script:
    - pip install -r requirements.txt
    - pytest --junitxml=junit.xml
  artifacts:
    reports:
      junit: junit.xml

pages:
  stage: test
  image: python:3.8
  script:
    - apt-get update && apt-get install -y allure
    - allure generate --clean -o allure-report ./allure-results
  artifacts:
    paths:
      - allure-report/
  only:
    - master

This configuration will run your tests using PyTest and generate a JUnit report that can be viewed in GitLab. It will also generate an Allure report and publish it to GitLab Pages (This can be deleted, but we wanted to make sure you utilized the full GitLab Pipeline)

Note that this is just an example configuration and may need to be adapted to your project requirements.


Understanding Gitlab CI/CD

GitLab CI/CD is a continuous integration and continuous deployment platform that automatically tests, builds, and releases code changes to the deployment environment.

It is a part of the Gitlab platform, which aims to become a one-stop shop for building DevOps processes for applications.

GitLab CI/CD is one of the most used CI/CD tools in the industry, and it has its advantages, such as being an extension of your software development processes in your team, where you can also build CI/CD pipelines on the same platform.

GitLab CI/CD has a simple architecture with a Gitlab instance or server hosting your application code and pipelines.

Connected to that Gitlab instance are multiple Gitlab runners, which are separate machines that execute the pipelines.

GitLab offers multiple managed runners, making it easy to start running pipelines without any setup and configuration effort.

In Gitlab CI/CD, you can build basic things like a PyTest environment or a full-fledged CI/CD pipeline that runs tests, builds your application’s Docker images, and pushes it to another server on production builds.

The core concepts of Gitlab CI/CD include jobs, stages, runners, and variables.

(The runners are fast too)

road runner

Gitlab makes it easy to start without any setup effort and allows you to have your pipeline as part of your application code.

This is a significant advantage compared to other CI/CD tools like Jenkins, where you must set up and configure the Jenkins server, create a pipeline, and then connect it to the Git project.

In summary, GitLab CI/CD is a powerful tool for automating the process of testing, building, and deploying code changes to the deployment environment. It offers a simple architecture, easy setup, and integration with Gitlab, making it an excellent choice for teams looking to streamline their DevOps processes.


GitLab CI/CD Vs. Other CI/CD Tools

GitLab CI/CD is one of the many CI/CD tools available.

While Jenkins is one of the most widely used CI/CD tools, GitLab CI/CD offers a unique advantage for teams already using Gitlab for their code repository.

One advantage of using GitLab CI/CD is that it seamlessly integrates with Gitlab repositories, allowing teams to build their CI/CD pipelines on the same platform they use for their code repository.

This eliminates the need for a separate tool and streamlines the workflow. Additionally, GitLab CI/CD requires no setup effort, as the pipelines are part of the application code and can be started without any configuration.

Regarding architecture, Gitlab CI/CD uses a Gitlab server that hosts the application code and pipelines, with multiple Gitlab runners connected to the server executing the pipelines.

Gitlab.com (Think of the code/repo side) offers a managed Gitlab instance with multiple runners, making it easy to start without any setup or configuration effort.

However, organizations can create partially or completely self-managed Gitlab setups if desired.

While many CI/CD tools are available, Gitlab CI/CD offers a unique advantage for teams already using Gitlab for their code repository, streamlining the workflow and eliminating the need for a separate tool.

thumbs up


Our Thoughts On GitLab for CI/CD

After analyzing the benefits of GitLab CI/CD, it is clear that it is a powerful tool for teams looking to build and manage their CI/CD pipelines quickly and easily.

With GitLab, everything can be managed in one place, making it a one-stop shop for building DevOps application processes.

This eliminates the need for separate tools and allows teams to extend their workflows on GitLab with this additional feature without any setup effort.

The seamless integration with GitLab and its managed infrastructure makes it easy for teams to start with GitLab CI/CD.

This is particularly beneficial for teams that want to save time and effort while having a robust CI/CD platform to help them improve their software development processes.

In conclusion, teams looking for quick and easy CI/CD platforms should choose GitLab as it provides a comprehensive solution to help them build, test, and deploy their applications efficiently. With GitLab, teams can manage their entire DevOps process in one place, making it a valuable tool for software development teams of all sizes.

 

Some Other CI/CD Articles

Here at enjoymachinelearning.com we have a few other in-depth articles about CI/CD.

Here are a few of those:

]]>
MLOps vs Data Engineer [Which Will You Like More?] https://enjoymachinelearning.com/blog/mlops-vs-data-engineer/ https://enjoymachinelearning.com/blog/mlops-vs-data-engineer/#respond Thu, 22 Feb 2024 23:01:21 +0000 https://enjoymachinelearning.com/?p=2396 Read more

]]>

In the fast-paced world of technology, two fields are currently blowing up.

These two roles: MLOps and Data Engineering, are crucial in transforming how businesses leverage data.

While one carves a path toward the seamless integration (seemingly impossible) and management of Machine Learning models, the other lays the robust foundation of Big Data architecture that fuels innovation.

But which one is the right path for you?

Is it the new and exciting world of MLOps, where models are built from experimental repos to production pipelines, constantly adapting to ever-changing regulations and customer needs?

Or is it Data Engineering, where data’s raw potential is harnessed into organized, accessible, and valuable? 

This blog post will explore MLOps and Data Engineering, breaking down what they are and why they matter.

We’ll look at how much you might earn in these fields, what the jobs are like, and what makes them different.

This information will help you determine the best fit for your interests and career goals.

So, if you’re already working in technology or just curious about these exciting areas, come along with us. We’ll help you learn about two important jobs in our world of data and technology. By the end, you might know which matches you best!

** Note: I currently work in MLOPS so I may be slightly biased. **

cute little robot

What is Data Engineering?

Data Engineering is collecting, cleaning, and organizing large datasets. It encompasses creating and maintaining architectures, such as databases and large-scale processing systems, and data transformation and analysis tools.

Data engineers build the infrastructure for data generation, transformation, and modeling.

Realize that scale is behind everything data engineers do, focusing primarily on data availability at scale.

data at scale

 

Why is Data Engineering Important?

Data Engineering is vital for any organization that relies on data for decision-making. It enables:

Efficient Data Handling

Data Engineering plays a crucial role in ensuring efficient data handling within an organization. By implementing proper data structures, storage mechanisms, and organization strategies, data can be retrieved and manipulated with ease and speed. Here’s how it works:

  • Organization: Sorting and categorizing data into meaningful groupings make it more navigable and searchable.
  • Storage: Using optimal storage solutions that fit the specific data type ensures that it can be accessed quickly when needed.
  • Integration: Combining data from various sources allows for a comprehensive view, which aids in more robust analysis and reporting.


Data Quality and Accuracy

Ensuring data quality and accuracy is paramount for making informed decisions:

  • Cleaning: This involves identifying and correcting errors or inconsistencies in data to improve its quality. It can include removing duplicates, filling missing values, and correcting mislabeled data.
  • Validation: Implementing rules to check the correctness and relevance of data ensures that only valid data is included in the analysis.
  • Preprocessing: This may include normalization, transformation, and other methods that prepare the data for analysis, which ensures that the data is in the best possible form for deriving meaningful insights.


Scalability

Scalability in data engineering refers to the ability of a system to handle growth in data volume and complexity:

  • Horizontal Scaling: Adding more machines to the existing pool allows handling more data without significantly changing the existing system architecture.
  • Vertical Scaling: This involves adding more power (CPU, RAM) to an existing machine to handle more data.
  • Flexible Architecture: Designing with scalability in mind ensures that the data handling capability can grow as the organization grows without a complete system overhaul.


Facilitating Data Analysis

Data Engineering sets the stage for insightful data analysis by:

  • Data Transformation: This includes converting data into a suitable format or structure for analysis. It may involve aggregating data, calculating summaries, and applying mathematical transformations.
  • Data Integration: Combining data from different sources provides a more holistic view, allowing analysts to make connections that might not be visible when looking at individual data sets.
  • Providing Tools: By implementing and maintaining tools that simplify data access and manipulation, data engineers enable data scientists and analysts to focus more on analysis rather than data wrangling.
  • Ensuring Timely Availability: Efficient pipelines ensure that fresh data is available for analysis as needed, enabling real-time or near-real-time insights.

Data Engineering forms the backbone and structure of most modern data-driven decision-making processes.

By focusing on efficient handling, quality, scalability, and facilitation of analysis, data engineers contribute to turning raw data into actionable intelligence that can guide an organization’s strategy and operations.

in the database


Famous Data Engineering Tools


Apache Hadoop

About: Apache Hadoop is an open-source framework that allows for the distributed processing of large data sets across clusters of computers.

Use: It uses simple programming models and is designed to scale from single servers to thousands of machines.


Apache Spark

About: Apache Spark is an open-source distributed computing system for fast computation.

Use: It provides an interface for entire programming clusters and is particularly known for its in-memory processing speed.


Kafka

About: Apache Kafka is an open-source stream-processing software platform.

Use: It’s used to build real-time data pipelines and streaming apps, often used for its fault tolerance and scalability.


Apache Flink

About: Apache Flink is an open-source stream-processing framework.

Use: It’s used for real-time computation that can perform analytics and complex event processing (CEP).


Snowflake

About: Snowflake is a cloud data platform that provides data warehouse features.

Use: It is known for its elasticity, enabling seamless computational power and storage scaling.


Airflow

About: Apache Airflow is an open-source tool to author, schedule, and monitor workflows programmatically.

Use: It manages complex ETL (Extract, Transform, Load) pipelines and orchestrates jobs in a distributed environment.


Tableau

About: Tableau is a data visualization tool that converts raw data into understandable formats.

Use: It allows users to connect, visualize, and share data in a way that makes sense for their organization.


Talend

About: Talend is a tool for data integration and data management.

Use: It allows users to connect, access, and manage data from various sources, providing a unified view.


Amazon Redshift

About: Amazon Redshift is a fully managed, petabyte-scale data warehouse service by Amazon.

Use: It allows fast query performance using columnar storage technology and parallelizing queries across multiple nodes.


Microsoft Azure HDInsight

About: Azure HDInsight is a cloud service from Microsoft that makes it easy to process massive amounts of big data.

Use: It analyzes data using popular open-source frameworks such as Hadoop, Spark, Hive, LLAP, Kafka, Storm, R, etc.

These tools collectively provide robust capabilities for handling, processing, and visualization of large-scale data and are integral parts of the data engineering landscape.


What is MLOps?

MLOps, short for Machine Learning Operations, is a set of practices that unifies machine learning (ML) system development and operations. It aims to automate and streamline the end-to-end ML lifecycle, covering everything from data preparation and model training to deployment and monitoring. MLOps helps maintain the ML models’ consistency, repeatability, and reliability.

What is commonly missed about MLOps is the CI/CD portion of the job. Correct builds, versioning, docker, runners, etc., make up a significant portion of the Machine learning engineers’ day-to-day work.

started


Why MLOps?

MLOps is critical in modern business environments for several reasons (besides feeding my family):


Streamlining The ML Workflow

MLOps helps different people in a company work together more smoothly on machine learning (ML) projects.

Think of it like a well-organized team sport where everyone knows their role:

  • Data Scientists: The players who develop strategies (ML models) to win the game.
  • Operations Teams: The coaches and support staff ensure everything runs smoothly.
  • MLOps: The rules and game plan that help everyone work together efficiently so the team can quickly score (deploy models).


Maintaining Model Quality

ML models need to keep working well even when things change. MLOps does this by:

  • Watching Constantly: Like a referee keeping an eye on the game, MLOps tools continuously check that the models are performing as they should.
  • Retraining When Needed: If a model starts to slip, MLOps helps to “coach” it back into shape by using new data and techniques so it stays solid and valuable.


Regulatory Compliance

Just like there are rules in sports, there are laws and regulations in business. MLOps helps ensure that ML models follow these rules:

  • Keeping Records: MLOps tools track what has been done, like a detailed scorecard. This ensures that the company can show they’ve followed all the necessary rules if anyone asks.
  • Checking Everything: Like a referee inspecting the equipment before a game, MLOps ensures everything is done correctly and fairly.


Enhancing Agility

In sports, agility helps players respond quickly to changes in the game. MLOps does something similar for businesses:

  • Quick Changes: If something in the market changes, MLOps helps the company to adjust its ML models quickly, like a team changing its game plan at halftime.
  • Staying Ahead: This ability to adapt helps the business stay ahead of competitors, just like agility on the field helps win games.

So, in simple terms, MLOps is like the rules, coaching, refereeing, and agility training for the game of machine learning in a business. It helps everyone work together, keeps the “players” (models) at their best, makes sure all the rules are followed and helps the “team” (company) adapt quickly to win in the market.


Famous MLOps Tools

Docker (The KING of MLops):

About: Docker is a platform for developing, shipping, and running container applications.

Use in MLOps:

Containerization: Docker allows data scientists and engineers to package an application with all its dependencies and libraries into a “container.” This ensures that the application runs the same way, regardless of where the container is deployed, leading to consistency across development, testing, and production environments.

Scalability: In an MLOps context, Docker can be used to scale ML models easily. If a particular model becomes popular and needs to handle more requests, Docker containers can be replicated to handle the increased load.

Integration with Orchestration Tools: Docker can be used with orchestration tools like Kubernetes to manage the deployment and scaling of containerized ML models. This orchestration allows for automated deployment, scaling, and management of containerized applications.

Collaboration: Docker containers encapsulate all dependencies, ensuring that all team members, including data scientists, developers, and operations, work in the same environment. This promotes collaboration and reduces the “it works on my machine” problem.

Version Control: Containers can be versioned, enabling easy rollback to previous versions and ensuring that the correct version of a model is deployed in production.

Docker has become an essential part of the MLOps toolkit because it allows for a seamless transition from development to production, enhances collaboration, and supports scalable and consistent deployment of machine learning models.


MLflow

About: MLflow is an open-source platform designed to manage the ML lifecycle.

Use: It includes tools for tracking experiments, packaging code into reproducible runs, and sharing and deploying models.


Kubeflow

About: Kubeflow is an open-source Kubernetes-native platform for developing, orchestrating, deploying, and running scalable and portable ML workloads.

Use: It’s designed to make deploying scalable ML workflows on Kubernetes simple, portable, and scalable.


TensorFlow Extended (TFX)

About: TensorFlow Extended is a production ML platform based on TensorFlow.

Use: It provides a configuration framework and shared libraries to integrate common components needed to define, launch, and monitor system-managed ML workflows.


DVC (Data Version Control)

About: DVC is an open-source version control system for ML projects.

Use: It helps track and manage data, models, and experiments, making it easier to reproduce and collaborate on projects.


Seldon Core

About: Seldon Core is an open-source platform for deploying, scaling, and monitoring machine learning models in Kubernetes.

Use: It allows for the seamless deployment of ML models in a scalable and flexible manner.


Metaflow

About: Developed by Netflix, Metaflow is a human-centric framework for data science.

Use: It helps data scientists manage real-life data and integrates with existing ML libraries to provide a unified end-to-end workflow.


Pachyderm

About: Pachyderm is a data versioning, data lineage, and data pipeline system built on Go.

Use: It allows users to version their data and models, making the entire data lineage reproducible and explainable.


Neptune.ai

About: Neptune.ai is a metadata store for MLOps, centralizing all metadata and results.

Use: It’s used for experiment tracking and model registry, allowing teams to compare experiments and collaborate more effectively.


Allegro AI

About: Allegro AI offers tools to manage the entire ML lifecycle.

Use: It helps in dataset management, experiment tracking, and production monitoring, simplifying complex ML processes.


Hydra

About: Hydra is an open-source framework for elegantly configuring complex applications.

Use: It can be used in MLOps to create configurable and reproducible experiment pipelines and manage resources across multiple environments.

These tools collectively provide comprehensive capabilities to handle various aspects of MLOps, such as model development, deployment, monitoring, collaboration, and compliance.

By integrating these tools, organizations can streamline their ML workflows, maintain model quality, ensure regulatory compliance, and enhance overall agility in their ML operations.

math


Which Career Path Makes More?

According to Glassdoor, the average MLOps engineer will bring home about $125,000 yearly.

Comparing this to the average data engineer, who will bring home about $115,000 annually.

While the MLOps engineer will bring home, on average, about $10,000 more a year – In my honest opinion, it’s not enough money to justify choosing one over the other.

bank

Sources:

https://www.glassdoor.com/Salaries/mlops-engineer-salary-SRCH_KO0,14.htm 

https://www.glassdoor.com/Salaries/data-engineer-salary-SRCH_KO0,13.htm


Which Career Is Better?

Hear me out, the answer is MLOps.

Just kidding (kind of).

Both of these careers – MLOps and Data Engineering – are stimulating, growing Year-over-Year (YoY), and technologically fulfilling.

But let’s dive a little deeper:

Stimulating Work

MLOps: The dynamic field of MLOps keeps you on your toes. From managing complex machine learning models to ensuring they run smoothly in production, there’s never a dull moment. It combines technology, creativity, and problem-solving, providing endless intellectual stimulation.

Data Engineering: Data Engineering is equally engaging. Imagine being the architect behind vast data landscapes, designing structures that make sense of petabytes of information, and transforming raw data into insightful nuggets. It’s a puzzle waiting to be solved; only the most creative minds need to apply.


Growing YoY

MLOps: With machine learning at the core of modern business innovation, MLOps has seen significant growth. Organizations are realizing the value of operationalizing ML models, and the demand for skilled MLOps professionals is skyrocketing.

Data Engineering: Data is often dubbed “the new oil,” and it’s not hard to see why. As companies collect more and more data, they need experts to handle, process, and interpret it. Data Engineering has become a cornerstone of this data revolution, and the field continues to expand yearly.


Technologically Fulfilling

MLOps: Working in MLOps means being at the cutting edge of technology. Whether deploying a state-of-the-art deep learning model or optimizing a system for real-time predictions, MLOps offers a chance to work with the latest and greatest tech.

Data Engineering: Data Engineers also revel in technology. From building scalable data pipelines to employing advanced analytics tools, they use technology to drive insights and create value. It’s a role that marries technology with practical business needs in a deeply fulfilling way.

It’s hard to definitively say whether MLOps or Data Engineering is the “better” field. Both are thrilling, expanding and provide a chance to work with state-of-the-art technology. The choice between them might come down to personal interests and career goals.

(pick MLOps)

wink

Some Other CI/CD Articles

Here at enjoymachinelearning.com we have a few other in-depth articles about CI/CD.

Here are a few of those:

]]>
https://enjoymachinelearning.com/blog/mlops-vs-data-engineer/feed/ 0
Debug CI/CD GitLab: Fixes for Your Jobs And Pipelines in Gitlab https://enjoymachinelearning.com/blog/debug-ci-cd-gitlab/ Thu, 22 Feb 2024 22:57:10 +0000 https://enjoymachinelearning.com/?p=2455 Read more

]]>
Are you ready to explore some valuable techniques and best practices for effectively debugging CI/CD in GitLab?

This will help streamline the development process and help your team to continue delivering their high-quality software products.

Continuous Integration and Continuous Deployment (CI/CD) are crucial in modern software development, enabling faster and more robust development cycles.

GitLab, a leading web-based DevOps lifecycle tool based around Git, has a feature-rich CI/CD platform to streamline this process.

Despite its benefits, challenges can occur during implementation and operation, creating the need for debugging.

In this article, we deeply dive into the intricate process of debugging CI/CD in GitLab. We provide a comprehensive guide on anticipating potential hurdles, diagnosing issues, and resolving them promptly for a smooth and efficient development workflow.

But first, we need to ensure you understand Gitlab CI/CD.


Understanding GitLab CI/CD

GitLab is a popular platform for developers, and one of its main features is the built-in GitLab CI/CD.

Continuous Integration and Continuous Deployment (CI/CD) are software development practices that involve frequent code changes, testing, and deployment. In this section, we will provide a brief overview of GitLab CI/CD and highlight its key components.


Understanding What CI/CD ACTUALLY Is

The first step in using GitLab CI/CD is understanding its core concepts. Continuous Integration (CI) refers to integrating all code changes into the main branch of a shared source code repository early and often. This allows for automatic testing of each change upon commit or merge, helping to catch issues sooner and streamline the development process. On the other hand, Continuous Deployment (CD) ensures that your code is automatically deployed to production once it has passed all necessary tests and checks, reducing downtime and increasing developer productivity.


The Pipeline

A key component in GitLab CI/CD is the pipeline, which consists of steps to streamline the software delivery process. The pipeline is defined in a .gitlab-ci.yml file that contains instructions for building, testing, and deploying your application.

pipeline

GitLab automatically detects this file when it is added to your repository and uses it to configure the GitLab Runner that executes the pipeline steps.

One helpful feature of GitLab CI/CD is its support for artifacts, files created during the pipeline process.

These can be used to store build outputs, test results, or any other data you may want to pass between pipeline stages, making managing and sharing resources easier throughout development.

Finally, GitLab offers various pre-built CI/CD templates for multiple languages, frameworks, and platforms. These templates can be customized to suit your specific project requirements, allowing you to focus on writing quality code and let GitLab handle the build, test, and deployment processes.


Exploring Pipelines and Jobs in GitLab

In GitLab, the pipeline is the top-level component of continuous integration (CI), delivery, and deployment. A pipeline is a sequence of stages that help us manage and streamline our software development process. A CI pipeline is more specific, focusing on the continuous integration aspects of our project.

Within pipelines, we have jobs, which are essentially tasks that need to be executed. Jobs in GitLab provide details on what actions should be performed, such as running tests or building the application. Now, there might be instances when we need to debug a GitLab CI job.

How can we do that?

Although GitLab doesn’t support SSH access to debug a job like some other CI tools, it does offer a practical alternative.

We can run a job locally by setting up a GitLab Runner. This local setup will enable us to halt the job, enter the container, and investigate any issues.

If we’re experiencing pipeline problems, it’s essential to understand the root cause. One common issue is timeout CICD GitLab, which occurs when a pipeline or job takes too long to execute. We can optimize our code, adjust the job’s configuration, or increase the timeout limit to resolve this.


Leveraging the GitLab Runner, Debug Logging

We sometimes need to debug our CI/CD pipelines in GitLab to ensure everything runs smoothly. One of the most effective ways to achieve this is by leveraging the GitLab Runner.

The beauty of the GitLab Runner lies in its ability to pick up jobs from the pipelines and execute them on a machine where it’s installed.

You’re probably wondering how we can efficiently use the GitLab Runner in our debugging process….?

thinking

The answer is simple – we must know the different configurations and options within our pipeline and jobs to assess the job log accurately.

Numerous configuration possibilities allow us to customize how the GitLab Runner functions according to our needs.

For instance, if you need to debug a job on a GitLab Runner, you can enable the interactive web terminal by modifying the GitLab Runner config.

We can also set up CI/CD rules in our projects to streamline the workflow and minimize manual intervention.

By implementing GitLab CI/CD Rules, we can effectively optimize our pipelines and ensure our processes are less time-consuming. This way, we can avoid tedious manual tasks and embrace the automation GitLab CI/CD offers.


Managing Variables in the UI and GitLab Environment

In our GitLab environment, managing variables is essential to ensure smooth and efficient CI/CD pipelines. Environment variables can store sensitive information like API keys and passwords, making it easier to customize our projects.

First, let’s discuss the different variables we can handle within GitLab.

There are Predefined Variables that GitLab automatically provides.

We can also create Variables Defined in the .gitlab-ci.yml file or define a Variable in the UI by going to Settings > CI/CD > Variables.

Variables Available can be accessed within our pipelines, allowing us to customize our projects based on the values set.

To Use Variables in our scripts and pipeline configurations, we can refer to them by their names, surrounded by dollar signs (e.g., $VARIABLE_NAME).

Set as Environment Variables helps implement different variables for various environments, such as staging and production. This way, we can keep sensitive data secure and separate, depending on our deployment needs.

We can also Override Variable Values Manually if we need to update any values temporarily or for testing purposes.

This can be done by editing the variables directly in the pipeline configuration.

When running a pipeline manually, GitLab provides a feature where some variables are Prefilled in Manual Pipelines to simplify the process.

These prefilled variables can be edited according to our needs before executing the pipeline.

Lastly, understanding the Scope of a Variable is crucial to ensure it is available only to the appropriate pipelines and environments.

We can control the scope of a variable by defining it at the project, group, or instance level, depending on our requirements.


Debugging GitLab CI/CD

Debugging GitLab CI/CD can be challenging, but we’ve got you covered. When issues arise in your pipeline, knowing the right way to dig into the problem and enable proper debug logging is essential.

To start with, let’s set up debug logging. You’ll want to access your GitLab Runner’s configuration file, usually located at /etc/gitlab-runner/config.toml.

In the Runner’s section, add log_level = "debug" and restart the Runner service to enable debug logging.

After enabling debug logging, it’s time to dive into the logs. Access the logs by executing journalctl -u gitlab-runner on the machine hosting the Runner.

Now, you can inspect the debug output and identify any CI/CD configuration issues.

Document Indentifier

Remember that you won’t have direct access to the debug logging using shared Runners. You can still replicate your pipeline locally by setting up a GitLab Runner on your machine.

The GitLab CI/CD PyTest Tutorial For Beginners is an excellent resource for getting started with local Runners and can provide insights on enabling debug features in your pipeline.


Securing Your Variables

One of the essential aspects of GitLab CI/CD is to ensure the security of our project’s variables. It can be quite a challenge to prevent the compromise of our sensitive data, considering that the pipeline is executed in various environments at different stages.

To secure our variables, let’s use GitLab’s Protect Variable feature. By enabling this, we protect our sensitive CI/CD variables and ensure they are available only to protected branches or tags. It reduces the risk of compromising our data by limiting access.

How can we further prevent the exposure of our sensitive variables?

We can manage the permissions of project members and limit access to only those who need it.

We should also store the secrets in a secure storage tool, like GitLab’s integration with HashiCorp Vault.

Let’s consider a scenario where a bad actor uploads a malicious file. This file could compromise our variables, even if we’ve protected and restricted them.

So, what can we do to mitigate this? Having security scans in place for code vulnerabilities and external dependencies is essential.

Tools like GitLab’s Security Scanners can analyze our codebase to identify vulnerabilities and risks, ensuring that we always keep our variables safe.


Decoding Advanced GitLab CI Workflow Hacks

We’ve gathered some of the most useful advanced GitLab CI workflow hacks shared by GitLab engineers to help you optimize your CI/CD pipelines.

These workflow hacks will give you a deeper understanding of how to make the most of these hacks to improve your productivity.

One of the key aspects to focus on is quick actions.

These allow you to perform tasks quickly and efficiently within GitLab.

For instance, you can navigate to specific files, create issues or merge requests, and even assign tasks to team members-all with just a few keystrokes.

By incorporating quick actions into your everyday use of GitLab, you’ll notice significant improvements in your workflow speed and efficiency 1.

Another valuable hack is utilizing only and except specs in your .gitlab-ci.yml file.

This helps you control when specific jobs are run based on changes in your source repository2. For instance, you can trigger certain jobs only when a change occurs in a specific branch or when a tag is created.

You can save precious time and resources by tailoring your pipelines to only run necessary jobs.

Chasing noise is wasted time

Fine-tuning your use of GitLab CI/CD can be further optimized by leveraging the workflow keyword3. This powerful feature enables greater control over pipeline stages, ensuring that specific conditions are met for a pipeline to run.

For example, you can configure your pipelines to only run when a merge request is created or exclude a branch pipeline if a merge request is already open.

Lastly, don’t forget to use GitLab’s built-in troubleshooting features. For example, the CI/CD configuration visualization visually represents your .gitlab-ci.yml file4.

This simplifies identifying and addressing any potential issues with your pipeline configuration.

Footnotes/References

  1. https://about.gitlab.com/blog/2021/10/19/top-10-gitlab-hacks/
  2. https://www.cncf.io/blog/2021/01/27/cicd-pipelines-using-gitlab-ci-argo-cd-with-anthos-config-management/
  3. https://docs.gitlab.com/ee/ci/yaml/workflow.html
  4. https://docs.gitlab.com/ee/ci/troubleshooting.html


Addressing Common Issues in CI Tools

When working with CI tools like GitLab, Travis CI, and other CI platforms, we may encounter a few common issues that can hinder the progress of our CI jobs.

We have compiled a list of these issues and their possible solutions to make our lives easier.


Syntax Errors:

One of the primary reasons for problems in our CI pipelines is incorrect syntax in our configuration files. Always double-check the syntax of our .gitlab-ci.yml or .travis.yml files and ensure that they meet the requirements of the respective CI tools.

GitLab provides a troubleshooting guide to help with any syntax issues.


Environment Variables

We often rely on environment variables in CI jobs to pass information between stages.

Use the appropriate environment variables and check their values before using them.

For GitLab, we can find a list of predefined CI/CD variables available for our pipelines.


Resource Allocation

CI jobs may fail if they require more resources than the CI tool provides.

Make sure that we allocate the required resources (like memory, CPU, and dependencies) for our CI jobs to avoid these failures.


Parallel Builds

Sometimes, builds fail due to parallel execution, causing race conditions or unexpected timing issues (database threading issues)

Understanding if our CI/CD pipeline can run parallel builds and how this would work in our CI tool, is essential.


Caching

While caching saves valuable time by storing intermediate build artifacts, incorrect cache configurations can lead to failures or undesired results.

Always ensure our caching configuration is set up correctly to avoid potential complications.

One of the biggest Issues I’ve every faced on a runner was because of docker caching of layers/images. Incorrectly building ontop of cached images slowed my team down by about 2 weeks!

]]>
CI/CD GitLab Artifacts: Streamline Your Development Process [Boost Your Productivity Now!] https://enjoymachinelearning.com/blog/cicd-gitlab-artifacts/ Thu, 22 Feb 2024 22:44:54 +0000 https://enjoymachinelearning.com/?p=2433 Read more

]]>
Continuous Integration and Continuous Deployment (CI/CD) is a software development practice that involves frequent code changes, testing, and deployment. GitLab is a popular platform that provides a complete DevOps toolchain, including CI/CD pipelines.

GitLab CI/CD pipelines help automate software building, testing, and deployment, saving developers a lot of time and effort.

GitLab CI/CD pipelines use artifacts to store build outputs, test results, and other files generated during the build process.

Artifacts are stored on the GitLab server and can be downloaded and used by subsequent jobs in the pipeline.

Job artifacts can be configured to include specific files or directories and can be named dynamically using CI/CD variables.

This makes sharing build outputs and other files easy across different pipeline stages.

Let’s see where we could potentially use these within our pipelines.

girl pointing down
pointing to the best article on the planet


Understanding GitLab Artifacts, And Where You Will See Them

GitLab Artifacts are file/files generated by a CI/CD pipeline job that is stored on the GitLab server.

These files can be used by other jobs in the same pipeline or can be downloaded by users for further analysis.

Understanding how to use GitLab Artifacts is essential for creating efficient and effective CI/CD pipelines.

 

Artifacts Archive

The Artifacts Archive is a compressed file containing all of the files a job generates.

This archive is stored on the GitLab server and can be downloaded by users for analysis.

The Artifacts Archive can include only specific files or directories by specifying them in the job configuration.


GitLab Pages

GitLab Pages is a feature that allows users to host static websites directly from their GitLab repository. Artifacts generated by a job can be used to populate a GitLab Pages site.

This is useful for creating documentation or other static content that is generated by a CI/CD pipeline.


API

The GitLab API provides a way to access Artifacts generated by a job programmatically.

This allows users to automate the download and analysis of Artifacts, making integrating them into other systems or workflows easier.


Artifacts for Parent and Child Jobs

When a job is part of a pipeline, it can generate Artifacts that are used by other jobs in the same pipeline.

These jobs can be either parent or child jobs.

Parent jobs are executed before the current job, while child jobs are executed after the current job. Artifacts generated by parent jobs can be used by child jobs, but not vice versa.


Job with the Same Name

If two jobs in the same pipeline have the same name, they share the same Artifacts.

This can be useful for creating parallel jobs that generate the same output, but it can also lead to confusion if the Artifacts are used unexpectedly.


Working with Parent and Child Pipelines

Parent-child pipelines are a feature GitLab CI/CD provides that helps manage complexity while keeping it all in a monorepo.

Splitting complex pipelines into multiple pipelines with a parent-child relationship can improve performance by allowing child pipelines to run concurrently.

A parent pipeline can trigger many child pipelines, and these child pipelines can trigger their child pipelines.

Pipelines within GitLab have a maximum depth of two levels of child pipelines. Once this depth is reached, you can not trigger another level of pipelines.

When working with parent and child pipelines, it is important to understand how to download the artifacts from the child pipelines.

GitLab CI/CD provides the ability to download the artifacts from the child pipelines using the dependencies keyword.

The dependencies keyword specifies the list of child pipelines that this pipeline depends on.

It is also important to note that the artifacts from the parent pipeline are not automatically passed to the child pipeline. You need to use the keyword to pass the artifacts from the parent pipeline to the child pipeline. The artifacts keyword specifies the list of files or directories to pass to the child pipeline.

Another important aspect to consider when working with parent and child pipelines is how to prevent the artifacts from expiring.

By default, artifacts expire after 30 days. To prevent artifacts from expiring, you can use the expire_in keyword. The expire_in keyword specifies the duration for which the artifacts should be kept.

artifact


Understanding CI Artifacts

CI artifacts are files generated during the CI/CD pipeline stored on the GitLab server.

These files can be used in subsequent jobs, allowing for faster and more efficient builds.

GitLab CI artifacts can be defined in the .gitlab-ci.yml file. The artifacts keyword is used to specify the files or directories that should be saved as artifacts.

To download job artifacts, navigate to the job page and click on the “Download” button next to the artifact you want to download. Artifacts can also be downloaded using the GitLab API.

GitLab Pages access control can be used to control who can access artifacts stored on GitLab Pages. Access control can be set to public, internal, or private.

GitLab can also be used as an artifact repository. This allows for easy sharing and distribution of artifacts across teams and projects.

Overall, understanding CI artifacts is an essential part of building efficient and effective CI/CD pipelines. By adequately defining and utilizing artifacts, developers can save time and improve the speed and reliability of their builds.


Creating and Managing Build Artifacts

To create and manage build artifacts in GitLab CI/CD, one can use the artifacts keyword in the .gitlab-ci.yml file.

This can be done by specifying the paths of the files and directories that need to be added to the job artifacts.

GitLab CI also allows users to watch a video tutorial on how to create and manage build artifacts. This tutorial is available on the GitLab website and can be accessed by anyone who wants to learn how to use GitLab CI/CD to create and manage build artifacts.

For beginners, a CI pipeline tutorial is available on the GitLab website. This tutorial is designed to help beginners learn how to create and manage CI pipelines using GitLab. The tutorial covers creating a simple pipeline, running tests, and deploying code.

When creating and managing build artifacts, it is important to remember that artifacts are a list of files and directories attached to a job after it finishes.

This feature is enabled by default in all GitLab installations. Disabling job artifacts may result in losing important data.

You can use various tools and technologies such as Gradle, Maven, and Ant to create job artifacts.

These tools allow users to automate creating and managing build artifacts.

Runner


Using GitLab UI and Runner

GitLab provides two ways to access job artifacts: through the GitLab UI and through the GitLab Runner.

Both methods have their own benefits and drawbacks, so it’s important to understand which one to use depending on the situation.


GitLab UI

The GitLab UI provides an easy-to-use interface for accessing job artifacts. To download job artifacts from the UI, follow these steps:

  1. Navigate to the job that produced the artifacts.
  2. Click on the “Artifacts” tab.
  3. Select the artifact you want to download.
  4. Click on the “Download” button.

It’s important to note that the UI only keeps artifacts from the most recent successful job.

If you want to keep artifacts from the most recent job, regardless of whether it was successful or not, you’ll need to use the GitLab Runner. 


GitLab Runner (Our Preference)

The GitLab Runner provides a more flexible way of accessing job artifacts. To download job artifacts from the Runner, you’ll need to use the artifacts keyword in your .gitlab-ci.yml file. Here’s an example:

job:
  script:
    - echo "Your First Runner!"
  artifacts:
    paths:
      - build/

In this example, the job produces an artifact located in the build/ directory. By using the artifacts keyword, the GitLab Runner will automatically upload the artifact to the GitLab server.

To download the artifact, follow these steps:

  1. Navigate to the job that produced the artifacts.
  2. Click on the “Artifacts” tab.
  3. Select the artifact you want to download.
  4. Click on the “Download” button.

Unlike the UI, the GitLab Runner can keep artifacts from the most recent job, regardless of whether it was successful or not. To do this, use the keep keyword in your .gitlab-ci.yml file.

Here’s an example:

job:
  script:
    - echo "Your First Runner!"
  artifacts:
    paths:
      - build/
    expire_in: 1 hour
  when: always
  allow_failure: true
  artifacts:
    name: "Example Artifact"
    paths:
      - build/
    when: always
    expire_in: 1 week
    keep: true

In this example, the keep keyword is set to true, which means that the artifact will be kept even if the job fails.

 The artifact will also be kept for a week before it is automatically deleted.


Understanding Child Pipelines and Jobs

In GitLab, child pipelines are pipelines triggered by a parent pipeline. They are helpful for breaking down complex pipelines into smaller, more manageable pieces. Child pipelines have their own set of jobs, and each job can have its own set of artifacts.

When a child pipeline is triggered, it inherits the variables and artifacts from the parent pipeline. This means that the jobs in the parent pipeline can access any artifacts generated by the jobs in the child pipeline.

However, it’s important to note that the latest artifacts are not immediately available to the parent pipeline. Instead, they are only available once the child pipeline has been completed successfully.

thumbs up

Starting from GitLab 13.5, child pipelines have a job responsible for uploading artifacts to GitLab Pages. This job is automatically added to the child pipeline when GitLab Pages is enabled for the project.

It’s also worth noting that artifacts are automatically deleted after a certain time. By default, artifacts are kept for 30 days, which can be configured in the project’s settings. We mostly utilize the keep keyword within our pipelines to keep our job log clean.


Specific Job and Artifact Storage

In GitLab, artifacts generated by jobs can be stored in a directory specified by the user. Each job can generate its artifacts stored in a unique directory. 

This allows for easy access to job-specific artifacts, even when multiple jobs are run in the same pipeline.

When a job is executed, the artifacts generated are stored in a directory specified by the user.

By default, the artifacts are stored in /var/opt/gitlab/gitlab-rails/shared/artifacts. 

However, the user can change the storage path by editing the gitlab.rb file and adding the line

gitlab_rails[‘artifacts_path’] = “/mnt/storage/artifacts”. 

Once the file is saved, GitLab must be reconfigured by running sudo gitlab-ctl reconfigure.

To download the artifacts archive, the user can use the GitLab UI or the API. The UI provides an easy-to-use interface for downloading artifacts, while the API allows for more programmatic access.

The API endpoint for downloading artifacts is /projects/:id/jobs/:job_id/artifacts.

Artifacts can also be accessed directly from the job page.

When viewing a job, the user can click on the “Artifacts” tab to see a list of artifacts generated by that job.

From there, the user can download the artifacts directly.

For more information on job artifacts in GitLab, please refer to the GitLab product documentation.


Understanding Job Artifacts and Merge Requests

When a merge request is created, GitLab automatically runs a pipeline for the merge request.

This pipeline includes jobs that are specific to the merge request.

The merge request pipeline runs in a separate environment from the main pipeline. This allows developers to test their changes separately before merging them into the main branch.

We like to run things like PyTest and a few other final things when a merge request is ran

We use something like the below to do this:

Here’s an example:

# Define the job that runs on merge requests
merge_request:
  stage: test
  only:
    - merge_requests
  script:
    - echo "Running tests on merge request..."

Job artifacts in the merge request pipeline are important because they allow developers to see the results of their changes.

Artifacts in the merge request show each job’s output in the pipeline.

Developers can view the artifacts in the merge request UI. They can also download the artifacts using the API.

If a job fails in the merge request pipeline, developers can retry the failed job. When the job is retried, the artifacts from the previous run are still available.

This can be useful for debugging and testing purposes.

debugging


Working with Specific Jobs and Artifacts Archive

When working with GitLab CI/CD pipelines, it is often necessary to work with specific jobs and artifacts archives. The artifacts archive is a collection of files and directories a job generates. 

GitLab Runner defaults the artifacts archive to GitLab when a job succeeds.

However, it is possible to configure GitLab Runner to upload the artifacts archive on failure or always using the artifacts:when parameter. 

When a job generates artifacts, they can be used in subsequent jobs by specifying the dependencies keyword.

This keyword passes all artifacts from previous jobs by default. However, using the keyword, it is also possible to specify which jobs to depend on.

In addition to artifacts, GitLab allows for scanning report uploads and fuzzing report uploads to GitLab. These reports can be generated by specific jobs and uploaded to GitLab for further analysis.

When working with specific jobs and artifacts archives, it is important to consider the following questions:

  • Which jobs generate artifacts that need to be used in subsequent jobs?
  • How can the artifacts archive be uploaded to GitLab?
  • How can scanning and fuzzing reports be uploaded to GitLab?
  • What happens when a job succeeds or fails?

By answering these questions, developers can effectively work with specific jobs and artifact archives in GitLab CI/CD pipelines.

top tips


Retrieving and Keeping Job Artifacts

To retrieve job artifacts, users can navigate to the job details page and select the “Download” button next to the artifacts they wish to retrieve.

Alternatively, job artifacts can be retrieved programmatically using the GitLab API.

By default, GitLab keeps job artifacts for a limited time before automatically deleting them. However, users can specify a longer or shorter expiration time for job artifacts using the expire_in keyword in their job configuration.

If expire_in is not defined, the instance-wide setting is used. Users can select “Keep” from the job details page to prevent artifacts from expiring.

Test reports provide more details on the job’s performance and can be used to diagnose problems and improve the quality of the codebase. GitLab supports various test report formats, including JUnit, Cucumber JSON, and Cobertura.

For example, collecting a JUnit test report requires adding a script to the job that generates the report and then specifying the path to the report using the paths keyword in the job configuration.

The resulting JUnit test report can be viewed on the job details page, and it can be used to track the progress of the test suite over time.

 

Some Other CI/CD Articles

Here at enjoymachinelearning.com we have a few other in-depth articles about CI/CD.

Here are a few of those:

]]>
.NET CI/CD In GitLab [WITH CODE EXAMPLES] https://enjoymachinelearning.com/blog/dotnet-cicd-gitlab/ Thu, 22 Feb 2024 22:41:27 +0000 https://enjoymachinelearning.com/?p=2461 Read more

]]>
The .NET core is a powerful framework for developing an array of applications, and when paired with CI/CD practices, it can bolster productivity, code quality, and speed of delivery.

GitLab, in particular, accommodates .NET CI/CD excellently, given its extensive suite of developer tools and state-of-the-art automation capabilities.

This is because of GitLab’s version control, automated testing, and continuous deployment facilities to craft resilient .NET applications.

This post delves deeper into .NET CI/CD with GitLab, exploring how developers can leverage its benefits, navigate potential challenges, and optimize .NET application development.

Let’s Jump In.

Understanding GitLab CI/CD and .NET

Before jumping into .NET and .Net projects, let’s double check we understand what GitLab is.

GitLab is a popular platform that hosts repositories and provides CI/CD pipelines to automate the process of building, testing, and deploying applications.

With the emergence of .NET Core (now just .NET in version 6), developers can use GitLab CI/CD tools to streamline their development workflow.

We must understand how to configure GitLab CI/CD for a .NET project.

To do this, we create a .gitlab-ci.yml file in the root of our repository. This file contains instructions and configurations for our GitLab pipeline.

This file defines stages, such as build, test, and deploy, and describes what should happen in each stage.

For example, in the build stage, we might use the dotnet build command to compile our .NET project.

Similarly, in the test stage, we might use dotnet test to run unit tests.

By configuring our pipeline correctly, we can automatically build, test, and deploy our .NET application whenever we push code to our GitLab repository.

One essential step for .NET projects is to use the dotnet publish command to generate release-ready DLLs and executables.

Including this command in our pipeline configuration ensures the compiled code is ready for deployment to various environments, such as staging or production.

To further improve our pipeline, we can adapt our CI/CD rules to match our desired workflow.

For instance, we can define conditions that must be met before deployment is triggered, like passing tests or code review approval.

the answer

Integrating .NET, Git, and The GitLab Repository (The Process)

By utilizing .NET with GitLab for continuous integration and deployment, we can automate our build, test, and deploy processes right in our own Git repository.

To start, we should create a GitLab CI configuration file (.gitlab-ci.yml) in the root folder of our repository.

This file will define the CI/CD pipeline and tell GitLab Runner which tasks will be executed.

To work with .NET, we should first set up a GitLab Runner to execute our CI/CD tasks.

This can be done by installing a GitLab Runner on a machine with the necessary .NET framework or runtime.

Once the Runner is installed and configured, it will be associated with our Git repository automatically.

With our Runner in place, it’s time to define our CI pipeline in the .gitlab-ci.yml file.

For a .NET project, we typically need to include build, test, and deploy stages.

The build stage compiles our code and creates the necessary build artifacts. We can define a simple build job in our configuration file as follows:

Note: This config assumes you have tagged your runner as “dotnet.”

build:
  stage: build
  script:
    - dotnet build
  tags:
    - dotnet

The test stage will execute automated tests to ensure our code works as expected.

We can add a test job to our configuration file similar to the following:

test:
  stage: test
  script:
    - dotnet test
  tags:
    - dotnet

Finally, the deployment stage will deploy our application to the desired environment to either a staging or production server. 

This step may vary depending on the target platform and our deployment strategy.

For more information on deploying .NET applications using GitLab, you might find this article about debugging CI/CD in GitLab helpful.

With these changes, our .NET CI/CD pipeline is ready.

ready

Whenever we push changes to our repository, GitLab Runner will automatically execute our pipeline to build, test, and deploy our application.

This way, we ensure that our code stays reliable, robust, and up-to-date throughout development.

Working with Docker and Kubernetes

When working with .NET CI/CD in GitLab, we can leverage the power of Docker and Kubernetes to streamline and supercharge our pipeline. 

First, we must create a Docker image or multiple images for our application. 

For this demo, we’re just going to build one image. Most production-level software will be multiple images. To build out more containers, you must utilize a docker-compose file with multiple dockerfiles. Read more about that here.

To accomplish this, we will write a Dockerfile that specifies the base image, dependencies, and build steps.

Once our Dockerfile is ready, we can execute docker build command to create the image.

We’ll utilize this Docker image in our GitLab CI/CD pipeline to test, build, and deploy our application.

Here is an example of how we can use the previously created image in our .gitlab-ci.yml file:

Note, that this assumes you’ve pushed your image to your relevant registry.

Since GitLab comes with a container registry, (within your repo!), we recommend pushing and storing it there.

build:
  stage: build
  image: your-docker-image
  script:
    - docker pull your-docker-image
    - docker run your-docker-image
  artifacts:
    paths:
      - /build-output

In this example, we pull our Docker image from our container registry and run it as part of the build stage.

Once the stage is completed successfully, any build artifacts will be collected as specified in the artifacts section.

Kubernetes Role 

Kubernetes plays a crucial role in managing the deployment of our application.

We can use GitLab CI/CD to interact with our Kubernetes cluster and deploy the application safely.

To do this, we need to ensure that we’ve installed an agent in our cluster.

With the agent installed, we can access a Kubernetes context and run Kubernetes API commands in our pipeline.

We can then use kubectl to manage the deployment, ensuring the application reaches the desired state.

An example of using kubectl to deploy our application might look like this:

deploy:
  stage: deploy
  image: your-docker-image (Could also use a Linux distro here, and deploy multiple images)
  script:
    - kubectl apply -f kubernetes-deployment.yaml
    - kubectl rollout status deployment/your-app
  environment:
    name: production

In this example, the deployment is managed by a Kubernetes manifest file kubernetes-deployment.yaml.

The kubectl apply command deploys the application while kubectl rollout status monitoring the progress of the deployment.

Building and Testing with .NET

GitLab CI/CD is a powerful tool for automating the testing, building, and deployment of code changes.

In this section, we’ll describe how to set up and utilize GitLab CI/CD to ensure our .NET applications are built and tested efficiently.

First, as we’ve already explained above, we’ll need to create a .gitlab-ci.yml configuration file in our repository’s root directory.

Inside this file, we’ll define the stages and jobs involved, such as building and testing our .NET projects.

We’ll need to specify the .NET commands necessary to run those processes.

For the building stage, we’ll use the dotnet build command, which will build our entire solution.

In our .gitlab-ci.yml file, we’ll define the build job like this:

build:
  stage: build
  script:
    - dotnet restore
    - dotnet build

Notice that we also included the dotnet restore command. This step is essential because it restores the dependencies needed for the project to build correctly.

Now, let’s move on to testing.

We should have one or more test projects in our solution to validate the application’s functionality. We can use the dotnet test command to run the tests in those projects.

Add the test job to the .gitlab-ci.yml file, like this:

test:
  stage: test
  script:
    - dotnet test

With our build and test jobs defined, GitLab CI/CD will automatically run these jobs whenever we push changes to the repository.

This way, we can quickly identify any issues arising from our code changes, ensuring that our .NET applications remain stable and high-quality.

Publishing and Deploying with .NET

In setting up CI/CD with GitLab for a .NET project, publishing and deploying your application is crucial.

We can automate this process using the dotnet publish command in our GitLab CI/CD pipeline.

We begin by adding a new stage for publishing in the .gitlab-ci.yml file. It should include the following script commands:

stages:
  - publish

publish:
  stage: publish
  script:
    - dotnet publish -c release --no-restore
  artifacts:
    paths:
      - path/to/published/directory

In this example, we use the dotnet publish -c release command to publish the application in release mode.

The --no-restore option is used to skip restoring packages to speed up the process.

After the successful execution of the dotnet publish command, we want to store the published files as an artifact in GitLab.

Artifacts are files passed between pipeline stages or stored to be used later (use the link above for a deep dive into artifacts)

To store the published directory as an artifact, add the following to our .gitlab-ci.yml file:

artifacts:
  paths:
    - path/to/published/directory

Finally, we must also consider our application’s dependencies. Often, .NET projects rely on NuGet packages, so we need to restore them before building and publishing our application.

The dotnet restore command ensures that all NuGet dependencies are installed properly.

Frequently Asked Questions:

In quick succession, we’ll answer some of the questions that readers have chimed in and asked us.

How can I use Docker and GitLab runners for CI/CD of ASP.NET Core applications?

To use Docker and GitLab runners for CI/CD of ASP.NET Core applications, you need to install Docker on your runner and start up a new Docker daemon. Then, download the GitLab Runner binary (gitlab-runner.exe) and register it with your GitLab instance by running the ‘register’ command. Following the registration, you’ll need to update the config.toml file with the runner’s details. Now, you can use the runner to execute your .NET Core CI/CD pipeline. Use the ‘dotnet tool install’ command to install any necessary tools in your Docker environment, or add them into a Dockerfile as dependencies.

How can I integrate my .NET Core CI/CD pipeline with AWS?

Integrating your .NET Core CI/CD pipeline with AWS starts with setting up your pipeline in GitLab. Once setup is complete, you can use the AWS CLI or SDKs to interact with AWS services from your pipeline. You will need to download and configure AWS credentials to use their endpoints. Furthermore, the ‘aws deploy’ command can be useful for deploying AWS infrastructure as part of your pipeline.

I personally like to build the project into docker containers and make those containers available for something like EKS (assuming a Kubernetes build).

How to leverage Azure DevOps for managing .NET Core projects?

With Azure DevOps, you can leverage several features for managing your .NET Core projects. This includes Azure Boards for tracking work, Azure Pipelines for CI/CD, Azure Repos for managing Git repositories, and Azure Test Plans for managing, tracking, and running your tests. Azure DevOps can be integrated with GitLab CI, allowing you to trigger pipeline runs upon code pushes and manage your project from one place.

How can I use Angular applications in my .NET Core projects?

Angular applications can be used with .NET Core API projects, providing a nice frontend for your application. The Angular CLI can help you understand how to build and serve your Angular application alongside your .NET Core API. You can integrate your Angular app within your pipeline in the GitLab CI/CD process to test, build, and deploy your Angular application in parallel with your .NET Core project.

How to improve code quality in .NET Core projects using SonarQube?

SonarQube can be set up as part of your GitLab CI/CD pipeline to help improve the code quality of your .NET Core projects. It is a tool that can analyze your source code for bugs, code issues, and security vulnerabilities. Once you install SonarQube, you can add a new stage in your pipeline to run SonarQube analysis on your code. After the analysis, SonarQube provides detailed reports which you can use to improve your code quality.

How can NuGet be used in .NET pipelines?

NuGet is a package manager for .NET and can be used in your CI/CD pipelines to manage your project’s dependencies. You can use the ‘dotnet restore’ command in your pipeline before the build stage to download any necessary NuGet packages. You can also use Nuget to publish your libraries as packages, which can be consumed by other developers or in different stages of your pipeline.

How do you generate artifacts from .NET Core SDK?

After successfully deploying your application by running ‘dotnet publish -c Release -o {output directory},’ an artifact representing your published application will be generated. This artifact includes everything needed to run your app, and you can use CI tools such as GitLab to store and manage these artifacts. You can define ‘artifacts’ in your gitlab-ci.yml file to specify which files and directories should be included as artifacts.

How do you configure SSH in GitLab CI for .NET Core projects?

Configuring SSH in GitLab CI involves generating a new SSH key pair and adding the public SSH key to your GitLab account. Then, you can use this SSH key in your pipeline to securely connect to other servers. Specifically, in .NET Core projects, SSH can securely deploy your application to a remote server as an advanced deployment strategy.

]]>
GitLab CI/CD Rules [Perfect Your Keyword Workflow] https://enjoymachinelearning.com/blog/gitlab-cicd-rules/ Thu, 22 Feb 2024 22:17:19 +0000 https://enjoymachinelearning.com/?p=2418 Read more

]]>
Are you tired of spending hours manually testing and deploying your code? Say goodbye to tedious processes and welcome the world of GitLab CI/CD Rules.

This powerful configuration is here to revolutionize your keyword workflow, making it seamless and efficient.

Whether you’re a developer, a project manager, or a team lead, GitLab CI/CD Rules will ensure your pipelines are evaluated correctly.

With GitLab CI/CD Rules, you can automate your code testing and deployment, saving you precious time and effort.

No more endless back-and-forths between teammates, worrying about manual errors, or untested code getting into the default branch.

This game-changing feature allows you to define a set of rules that trigger automated actions whenever specific keywords are mentioned.

Imagine the possibilities of having your tests run automatically, deployments executed promptly, and notifications sent to relevant team members, all with just a single push to your repository.

Achieving perfection in your keyword workflow has never been easier. GitLab CI/CD Rules offers an intuitive and customizable platform that adapts to your team’s specific needs.

It’s literally just a file (or file paths).

This flexible tool not only enhances collaboration and productivity but also ensures the seamless flow of your development, minimizing bottlenecks and maximizing efficiency.

(P.S: I think GitLab CI/CD is MUCH better than GitHub)

Where Are Rules Within GitLab?

Rules are everywhere within GitLab, and if you’ve recently started using GitLab or recently have been introduced in Gitlab, you’re going to need them.

HOWEVER

To understand job rules, you must first understand variables in GitLab CI. 

Variables come from various sources, such as API requests, manual entries by the user, scheduled pipelines, manual jobs, and more.

They are used in the CI engine – and when you run a job or pipeline, they appear as environment variables.

However, there are some things called Predefined Variables. Predefined variables are defined by GitLab and are populated by GitLab.

You can read from them, but you shouldn’t write to them.

There are many predefined variables; while you don’t need to know about them, some are handy to reference and use in rules.

Let’s jump in and make you an expert in the GitLab UI and other pipeline configurations.

rules

Understanding Variables in GitLab CI

To comprehend rules in GitLab CI, it is crucial to have a good understanding of variables.

Variables in GitLab CI originate from various sources, including manual entry by the user, API requests, configuration at the project, group, or instance level, generation in jobs in previous stages, direct input from the CI configuration, or GitLab itself.

These variables are used in two different places: the CI engine within GitLab and the jobs themselves, where they appear as environment variables.

It can be overwhelming to deal with many variables from different sources. Some of these variables are predefined and defined by GitLab and are populated by GitLab.

There are about 110 predefined variables, most of which are unnecessary for everyday use, but it is beneficial to know about them and reference them occasionally.

Predefined variables can be helpful when setting up rules.

It is also essential to understand that several events can trigger a pipeline to run, including a new commit, a new branch, or a new tag.

Using rules to detect how your pipeline and jobs are being run is crucial to prevent the same job from running twice.

Rules are the solution to this problem. The basic outline of rules includes the job name, the word “rules,” the “if” clause, and the script.

the answer

Breaking Down CI Rules (Keywords)

The “if” clause can reference variables; for the most part, the “if” clauses will reference variables.

The “only” and “except” blocks are no longer available in jobs, and you have to use variables for everything now.

The essential components of rules are clauses, operators, results, and when options. The clauses include “if,” “changes,” and “exists.”

The “if” clause is the one that is used the most, and it uses some function or expression using variables. The operators operate on variables, including “equals,” “not equals,” “equals to,” and “not equals to.”

The result of the clause is almost always “when.” The “when” clause can be “always,” “never,” “on success,” “on failure,” “manual,” or “delayed.”

If there is no “when” clause, the default is “when on success” and “allow failure false.”

A job is added to the pipeline if the rule matches and has “on success,” “delayed,” or “always” in the “when” clause.

If no rules match, the last clause is assumed to be “when never.” A job is not added to the pipeline if no rules match and there is no “when on success,” “when later,” or “when always” at the end.

If a rule matches and has “when never” as the attribute, the job will not run.

I can not explain to you how many times this rule has blocked a job run when a merge request is created (even on the default branch!)

Predefined Variables (Example Workflow To Run A Job)

Simply put, Predefined variables are defined by GitLab and populated by GitLab.

While users should not write to these variables, they can read from them. These predefined variables can be found in the reference material and documentation, and while about 90% of them are not necessary, some can be advantageous when setting up rules.

It is also essential to understand the various events that can trigger a pipeline to run, such as new commits, branches, tags, manual API calls, or scheduled runs.

Jobs and pipelines can detect how they were triggered, and it is crucial to use rules to prevent duplicate jobs from running.

Here is an example `.gitlab-ci.yml` file that uses some predefined GitLab keywords, which could be used as a merge request pipeline or a branch pipeline:

stages:
  - build
  - test
  - deploy

variables:
  IMAGE_TAG: $CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA

build:
  stage: build
  script:
    - docker build -t $IMAGE_TAG .
    - docker push $IMAGE_TAG

test:
  stage: test
  script:
    - docker run $IMAGE_TAG npm test

deploy:
  stage: deploy
  script:
    - kubectl apply -f kubernetes/deployment.yml

This configuration file defines three stages: `build`, `test`, and `deploy`.

The `build` stage builds a Docker image and pushes it to the GitLab Container Registry using the predefined variables `$CI_REGISTRY_IMAGE` and `$CI_COMMIT_SHORT_SHA`.

The `test` stage runs the tests inside a Docker container using the previously built image.

Finally, the `deploy` stage deploys the application to a Kubernetes cluster using the `kubectl` command.

Avoiding Duplicate Pipelines With Rules

We need to use rules in our GitLab CI configuration to avoid duplicate pipelines.

stop sign

Rules allow us to specify conditions under which a job should run based on variables from different sources. These variables can be predefined by GitLab or generated by jobs in previous stages.

When pipelines run, different events can trigger them, such as new commits, branches, or tags. Using rules to detect how pipelines and jobs are being run is vital to prevent the same job from running twice.

To use rules, we need to define them in our job configuration. The syntax for rules is as follows:

job_name:
 rules:
  - if: condition
  when: always/never/on_success/on_failure/manual/delayed
  allow_failure: true/false
  start_in: duration
  - if: condition
  when: always/never/on_success/on_failure/manual/delayed
  allow_failure: true/false
  start_in: duration
  ...(add more)
 script:
  - command1
  - command2
  ...(add more)

The if clause is the most critical part of the rules and allows us to specify conditions using variables.

We can use different operators, such as ==, !=, =~, &&, and ||, to combine variables and create complex conditions.

The when clause specifies when the job should run based on the pipeline status.

As we’ve said before, you can use different options, such as always, never, on_success, on_failure, manual, and delayed.

The allow_failure option allows us to specify whether the job can fail without affecting the pipeline status.

The start_in option allows us to specify a duration to delay the job start.

It’s important to note that if we don’t specify a when clause, the default value is on_success.

If we don’t specify any rules, the job will run in all cases where the previous stages have passed.

In summary, by using rules in our GitLab CI configuration, we can prevent duplicate pipelines and improve the efficiency of our jobs.

We can specify conditions using variables and different operators and control when the job should run based on the pipeline status.

 

Some Other CI/CD Articles

Here at enjoymachinelearning.com we have a few other articles that goes in-depth about CI/CD.

Here are a few of those:

]]>
Understanding Pipeline Problems (Timeout CICD GitLab) https://enjoymachinelearning.com/blog/timeout-cicd-gitlab/ Thu, 22 Feb 2024 21:59:25 +0000 https://enjoymachinelearning.com/?p=2444 Read more

]]>
In the ever-evolving world of software development, Continuous Integration and Continuous Delivery (CICD) plays a pivotal role in your team’s continuous success.

Among countless tools available for CICD, GitLab stands out due to its comprehensive suite of features.

Managing GitLab efficiently demands a deep understanding of numerous concepts and practices, including one that isn’t talked about a ton: timeout settings. 

This article will dive deep into the specifics of timeout in GitLab’s CICD process, providing a detailed understanding of its functionality, its implications for your workflow, and how to optimize it for better performance and productivity. 

We’ll also give some tips and tricks at the end (scenario-based) of problems you could be running into – that’ll quickly get you on your way to solving whatever issue you’re currently running into.

thumbs up in an office

GitLab Runner Failure: Project Level Timeout

As we know from above, two types of timeouts can be defined: project- and runner-level timeouts.

To define project-level timeout, go to your specific project settings, click on CI/CD, and then click on general pipelines.

There, you will find the option to set the timeout.

By default, the timeout is 60 minutes or one hour. If any job surpasses this timeout threshold, it will be marked as failed.

sand time clock

If you have defined both project- and runner-level timeout, which takes precedence?

There are three scenarios.

  • First, if the runner level timeout is bigger than the project level timeout, and the job runs longer than defined in the project level timeout, it will be marked as failed. 
  • Second, if the runner-level timeout is not configured, the project-level timeout will be considered.
  • Third, if the runner-level timeout is smaller than the project-level timeout, and the job runs longer than the runner-level timeout, it will be marked as failed.

GitLab Runner Failure: Runner Level Timeout

To set the runner level timeout, go to a specific project’s settings and navigate to the CI/CD runners section.

Expand the section, click on the pencil icon of a specific runner, and you will see the option to define the maximum job timeout.

By setting this option, you can define a specific runner-level job timeout.

Let’s say you have five different GitLab runners associated with five different projects. In that case, you can define five different job timeouts for these five GitLab runners by going to this option.

If you have defined both project-level timeout and runner-level timeout, then the runner-level timeout takes precedence.

subspace

There are three scenarios to consider:

  1. Runner level timeout is bigger than project level timeout – If the job runs longer than defined in the project level timeout, it will be marked as fail.
  2. Runner level timeout is not configured – In this case, the project level timeout will be considered the higher timeout. Any CI/CD pipeline running longer than the project level timeout will be marked as a failure.
  3. Runner level timeout is smaller than project level timeout – If any CI/CD pipeline runs longer than the runner level timeout, it will be marked as fail.

To avoid such failures, it is recommended to increase the job timeout. You can ensure that your CI/CD pipelines run smoothly without interruptions by defining appropriate timeouts at both project and runner levels.

Timeout Pipeline Problems Solutions

If we have defined project-level timeout and runner-level timeout, then there are three scenarios to consider.

  • First, if the runner level timeout is bigger than the project level timeout and the job runs longer than defined in the project level timeout, it will be marked as a fail.
  • Second, if the runner level timeout is not configured, then the project level timeout will be considered, and any CI/CD pipeline running longer than the project level timeout will be marked as a failure.
  • Third, if the runner level timeout is smaller than the project level timeout and any CI/CD pipeline running longer than the runner level timeout will be marked as a fail.

Increasing the job timeout is the solution to this problem. By defining runner-level or project-level timeout, we can ensure that our CI/CD pipelines do not fail due to job timeouts.

Surprised man

What are pipeline badges in GitLab CI?

Pipeline badges are visual indicators in GitLab CI that display the current pipeline status and test coverage. They are helpful for quickly assessing the health of a project without having to drill down into the details. (Manager view)

How can I limit the number of changes in a GitLab CI pipeline?

In the GitLab CI pipeline settings, you can limit the number of changes that GitLab will recognize. This can help manage builds effectively by preventing too many changes from overloading the pipeline.

What are shallow clones in GitLab CI pipelines?

Shallow clone in GitLab CI pipelines defines how GitLab CI clones the repository from scratch for every job. Instead of cloning the entire history of the repository, it creates a shallow clone with only a limited number of revisions. This feature makes the local working copy smaller and can speed up jobs that don’t need the entire history.

What does the configuration file do in a GitLab CI pipeline?

The configuration file in a GitLab CI pipeline is essential. It defines how the pipeline should operate for a specific project, including how to run jobs, where to store job artifacts, and how to manage your pipeline settings. This file will be autodetected within your project directory and should be named .gitlab-ci.yml.

Why are redundant pipelines a problem in GitLab CI?

Redundant pipelines in GitLab CI are pipelines that run despite no relevant changes in the code. This can consume precious CI/CD resources and delay actual critical jobs. That’s why managing these properly is vital, possibly by enabling the auto-cancel redundant pipelines feature (like timeout!!).

How do I set a time limit for a job in the pipeline?

You can set a time limit for a job in the pipeline in the job’s configuration within the .gitlab-ci.yml file. The timeout defines the maximum amount of time a job can run in minutes or hours (or both). If a job surpasses the threshold, GitLab CI will automatically stop the job to prevent hogging resources. This only refers to specific jobs within a whole pipeline. To set a pipeline-specific timeout, you’ll need to utilize the steps talked about above (within the settings).

What are job artifacts, and what is their visibility in pipelines?

Job artifacts in GitLab CI pipelines refer to the files created by jobs that you might need in later stages or for other purposes. Regarding their visibility, anyone with reporter or higher access in public projects can access the job artifacts by default. All of this can be tailored in your project’s settings.

How do I avoid cloning the repository from scratch for every job?

To save CI/CD runner processing time, you can adjust your GitLab CI configuration to avoid cloning the repository from scratch for every job. Adjust your .gitlab-ci.yml file to use cache or artifacts to pass data between pipeline stages instead of doing a full clone for each job.

What does scratch for every job mean?

Scratch for every job refers to the default GitLab CI behavior where the complete repository is cloned for every job in the pipeline, creating a fresh workspace each time a job runs. Although this can cause additional processing overhead, it ensures that each job starts with a clean, predictable state. Remember to increase timeout if this type of configuration is used, as it can significantly increase the amount of time a pipeline runs.

Some Other CI/CD Articles

Here at enjoymachinelearning.com, we have a few other in-depth articles about CI/CD.

Here are a few of those:

]]>