Configurable Code Duplication Criteria as Part of Custom Code Health

CodeScene now provides a more flexible approach to handling code duplication by allowing you to configure the thresholds for detecting duplicated code. This feature unifies how code duplication is reported across all programming languages and makes the duplication analysis more consistent and customizable for your needs.

Tailor Code Duplication Reporting to Your Project

Previously, CodeScene's code duplication analysis took file size into account, with larger files having less impact on the duplication score. With the new configuration options, code duplication is now evaluated independently of file size, providing a more uniform and accurate assessment.

To give you better control over how duplication is reported, the following thresholds can now be customized:

  1. function_duplication_min_lines_of_code_for_check
    This threshold defines the minimum number of lines a function must have to be considered for code duplication analysis. By default, this is set to 10 lines of code (LOC). This helps to prevent small functions (1-2 LOC) from generating false positives, ensuring that only more substantial functions are considered when identifying duplication.

  2. function_duplication_min_similarity_percentage
    This setting defines the minimum structural similarity percentage required for two or more functions to be flagged as duplicates. By default, the threshold is set to 75%, meaning functions will only be considered duplicates if their structural similarity exceeds this percentage.

Benefits of Customizable Duplication Criteria

These customizable thresholds allow you to tailor the duplication reporting to better match the characteristics of your codebase, ensuring that the code health report reflects the most relevant insights for your team. You can fine-tune the duplication criteria to focus on the areas that matter most, reducing noise and increasing the accuracy of your reports.

By offering this level of customization, CodeScene helps you manage code duplication more effectively, making it easier to maintain high-quality code while addressing potential areas of improvement.


You can read more about customizing code health rules via JSON here.