Yandex

Java Advanced ConceptsJava Advanced Concepts3

Java String offsetByCodePoints() method
Syntax and Examples



Introduction

The `offsetByCodePoints()` method in Java's String class is a powerful tool for calculating character offsets when your strings contain Unicode characters that are represented by more than one code point (like emojis or certain accented letters). Think of it as adjusting an index based on the complexity of the underlying characters.

Syntax


public int offsetByCodePoints(int index, int codePointOffset)

Parameters

Parameter Description
index The starting index in the string. This is your initial position within the string.
codePointOffset The number of code points to offset from the given index. Positive values move forward, negative values move backward.

Return Value

The method returns an int representing the new index in the string after applying the code point offset.

Examples

Example 1: Moving Forward with a Single Code Point Offset

This example demonstrates moving forward one code point from a specific index. Since 'a' is represented by a single code point, the resulting index will be simply one position ahead.


String str = "abc";
int startIndex = 1; // Index of 'b'
int offset = 1;
int newIndex = str.offsetByCodePoints(startIndex, offset);
System.out.println("New index: " + newIndex);


New index: 2

Explanation: The `offsetByCodePoints()` method takes the initial index (1, pointing to 'b') and offsets it by one code point. Since 'b' is a single code point character, the new index becomes 2, which corresponds to the next character 'c'.

Example 2: Moving Backward with a Single Code Point Offset

This example shows how to move backward by one code point. Again, because we're dealing with single-code-point characters, the shift is straightforward.


String str = "abc";
int startIndex = 2; // Index of 'c'
int offset = -1;
int newIndex = str.offsetByCodePoints(startIndex, offset);
System.out.println("New index: " + newIndex);


New index: 1

Explanation: The starting index is 2 (pointing to 'c'). Offsetting backward by one code point results in a new index of 1, which points to the character 'b'.

Example 3: Handling Unicode Surrogate Pairs

This example highlights the real power of `offsetByCodePoints()`. It demonstrates how to correctly adjust an index when dealing with characters represented by a surrogate pair (two code units) - like many emojis. A character needing two code units will shift indexes differently than single-code point characters.


String str = "Hello 😀 World"; // 😀 is an emoji requiring a surrogate pair.
int startIndex = 5; // Index immediately before the emoji
int offset = 1;
int newIndex = str.offsetByCodePoints(startIndex, offset);
System.out.println("New index: " + newIndex);


New index: 7

Explanation: The starting index is 5, which is the position just before the emoji 😀. Because an emoji can require two code units (a surrogate pair), moving one code point forward doesn't simply increment the index by one. Instead, `offsetByCodePoints()` correctly calculates the new index as 7 to point *into* the surrogate pair representing the emoji.

Example 4: Moving Backwards Through a Surrogate Pair

This demonstrates moving backwards through a surrogate pair.


String str = "Hello 😀 World"; // 😀 is an emoji requiring a surrogate pair.
int startIndex = 7; // Index inside the emoji (surrogate pair)
int offset = -1;
int newIndex = str.offsetByCodePoints(startIndex, offset);
System.out.println("New index: " + newIndex);


New index: 5

Explanation: The starting index is 7, which is within the emoji 😀. Moving one code point backward results in a new index of 5, correctly positioning us just before the start of the surrogate pair.

Example 5: Offsetting by Multiple Code Points

This example shows how `offsetByCodePoints()` behaves when offsetting by a value greater than one. It uses an emoji to demonstrate that it's based on code points, not characters.


String str = "Hello 😀 World"; // 😀 is an emoji requiring a surrogate pair.
int startIndex = 5; // Index immediately before the emoji
int offset = 2;
int newIndex = str.offsetByCodePoints(startIndex, offset);
System.out.println("New index: " + newIndex);


New index: 9

Explanation: The starting index is 5. Offsetting by two code points moves past the initial surrogate pair unit, landing at index 9.



Welcome to ProgramGuru

Sign up to start your journey with us

Support ProgramGuru.org

You can support this website with a contribution of your choice.

When making a contribution, mention your name, and programguru.org in the message. Your name shall be displayed in the sponsors list.

PayPal

UPI

PhonePe QR

MALLIKARJUNA M