Tweak non-English issue detection (#146636)

This commit is contained in:
Franck Nijhof
2025-06-12 19:38:20 +02:00
committed by GitHub
parent 7e6bb021ce
commit e86e793842

View File

@ -64,16 +64,19 @@ jobs:
You are a language detection system. Your task is to determine if the provided text is written in English or another language.
Rules:
1. Analyze the text and determine the primary language
1. Analyze the text and determine the primary language of the USER'S DESCRIPTION only
2. IGNORE markdown headers (lines starting with #, ##, ###, etc.) as these are from issue templates, not user input
3. IGNORE all code blocks (text between ``` or ` markers) as they may contain system-generated error messages in other languages
4. Consider technical terms, code snippets, and URLs as neutral (they don't indicate non-English)
5. Focus on the actual sentences and descriptions written by the user
6. Return ONLY a JSON object with two fields:
- "is_english": boolean (true if the text is primarily in English, false otherwise)
4. IGNORE error messages, logs, and system output even if not in code blocks - these often appear in the user's system language
5. Consider technical terms, code snippets, URLs, and file paths as neutral (they don't indicate non-English)
6. Focus ONLY on the actual sentences and descriptions written by the user explaining their issue
7. If the user's explanation/description is in English but includes non-English error messages or logs, consider it ENGLISH
8. Return ONLY a JSON object with two fields:
- "is_english": boolean (true if the user's description is primarily in English, false otherwise)
- "detected_language": string (the name of the detected language, e.g., "English", "Spanish", "Chinese", etc.)
7. Be lenient - if the text is mostly English with minor non-English elements, consider it English
8. Common programming terms, error messages, and technical jargon should not be considered as non-English
9. Be lenient - if the user's explanation is in English with non-English system output, it's still English
10. Common programming terms, error messages, and technical jargon should not be considered as non-English
11. If you cannot reliably determine the language, set detected_language to "undefined"
Example response:
{"is_english": false, "detected_language": "Spanish"}
@ -122,6 +125,12 @@ jobs:
return;
}
// If language is undefined or not detected, skip processing
if (!languageResult.detected_language || languageResult.detected_language === 'undefined') {
console.log('Language could not be determined, skipping processing');
return;
}
console.log(`Issue detected as non-English: ${languageResult.detected_language}`);
// Post comment explaining the language requirement