|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectbb.util.StringUtil
public final class StringUtil
Provides various static String (and sometimes char[]) utility methods.
This class is multithread safe: it is immutable (both its immediate state, as well as the deep state of its fields).
Nested Class Summary | |
---|---|
static class |
StringUtil.UnitTest
See the Overview page of the project's javadocs for a general description of this unit test class. |
Field Summary | |
---|---|
private static Pattern |
lineTerminatorPattern
A regex which matches Pattern's line terminators. |
static String |
newline
Platform's standard for the char sequence that should separate lines. |
private static int |
numCharsContext
|
private static Pattern |
spaceTabCommaPattern
A regex which matches any combination of one or more space, tab or comma chars. |
private static boolean |
stringContructorTrimsGarbage
|
Constructor Summary | |
---|---|
private |
StringUtil()
This private constructor suppresses the default (public) constructor, ensuring non-instantiability. |
Method Summary | ||
---|---|---|
static String |
arraysToTextColumns(double[][] arrays,
String[] header)
Returns a String representation of arrays as a series of text columns. |
|
static char[] |
asciiBytesToChars(byte[] bytes)
Converts each element of bytes to a char[] which is returned. |
|
static char[] |
bytesToChars(byte[] bytes)
Converts bytes into a char[] which is returned. |
|
private static String |
describeAsciiCharsByHammingDistance(byte b)
Meant for use just by diagnoseProblem to provide a better diagnostic message to the user. |
|
private static String |
diagnoseProblem(byte[] bytes,
int i)
Meant for use just by asciiBytesToChars to provide a better diagnostic message to the user. |
|
static String |
ensureEndsInNewLine(String s)
This method returns the supplied String if it already ends in a newline char sequence. |
|
static String |
ensureSuffix(String s,
char suffix)
Returns s itself if s already ends with suffix, else returns s concatenated with suffix. |
|
static String |
ensureSuffix(String s,
String suffix)
Returns s itself if s already ends with suffix, else returns s concatenated with suffix. |
|
static boolean |
equalChars(char[] chars,
String s)
Determines whether a char[] has exactly the same chars as a String . |
|
static boolean |
equals(String s1,
String s2)
Determines whether or not s1 is equal to s2. |
|
private static String |
getAsciiContext(byte[] bytes,
int i)
Meant for use just by diagnoseProblem to provide a better diagnostic message to the user. |
|
static String |
getTabs(int numberTabs)
Convenience method that returns the equivalent of . |
|
static String |
indentLines(String s)
Returns . |
|
static String |
indentLines(String s,
int numberTabs)
Parses individual lines out of s by a call to . |
|
private static List<String> |
initList(String s)
|
|
private static boolean |
inspectStringConstructor()
|
|
static boolean |
isAllAsciiChars(String s)
Determines whether or not s consists exclusively of US-ASCII chars. |
|
static boolean |
isBlank(String s)
Determines whether or not s is "blank" (i.e. is either null, zero-length, or solely consists of whitespace). |
|
static boolean |
isNewLineEnd(String s)
This method determines whether or not the supplied String ends in a newline char. |
|
static boolean |
isTrimmable(String s)
Determines whether or not s is "trimmable" (i.e. either begins and/or ends with a whitespace char). |
|
static String |
keepWithinLength(String s,
int limitLength)
Returns s if its length is less than limitLength. |
|
static String |
newString(String s)
Immediately returns null if s == null. |
|
private static boolean |
nextCharNewline(String s,
int i)
|
|
static String |
normalizeWhitespace(String s)
Performs whitespace normalization, as per the XML spec. |
|
static String[] |
parseLines(String s)
Returns . |
|
static String[] |
parseLines(String s,
boolean includeEol)
Parses individual lines out of s, which are collectively returned as String[]. |
|
static String[] |
quoteWhitespaceTokenize(String source,
boolean includeQuotes)
This method breaks up the supplied String into tokens. |
|
private static SortedMap<Integer,List<Character>> |
rankAsciiCharsByHammingDistance(byte b)
Meant for use just by describeAsciiCharsByHammingDistance to provide a better diagnostic message to the user. |
|
static String |
removeQuotes(String s,
int lineNumber)
This utility method removes a matching leading and trailing pair of quote marks, if present, from the supplied String and returns the substring inside the quotes. |
|
static String |
repeatChars(char c,
int length)
Returns a String of the specified length which consists of entirely of the char c. |
|
static List<String> |
splitByChar(String s,
char delimiter,
int n,
boolean nIsExact)
Identical to splitByLiteral except that the token delimiter is restricted to a single char,
which allows an even more optimized algorithm to be used. |
|
static List<String> |
splitByLiteral(String s,
String delimiter,
int n,
boolean nIsExact)
Splits s into tokens. |
|
static String |
toLength(int number,
int length)
Returns a String which represents number and consists of exactly length digits, with leading 0's padded if necessary. |
|
static String |
toLength(String s,
int length,
boolean prepend,
char c)
Returns a String of exactly length chars. |
|
static String[][] |
toMatrix(CharSequence cs)
Returns . |
|
static String[][] |
toMatrix(CharSequence cs,
Pattern rowDelimiter,
Pattern columnDelimiter)
Returns a matrix representation of cs. |
|
static String |
toString(boolean[] array,
String separator)
Returns a String representation of array. |
|
static String |
toString(byte[] array,
String separator)
Returns a String representation of array. |
|
static String |
toString(char[] array,
String separator)
Returns a String representation of array. |
|
static String |
toString(Collection<?> collection,
String separator)
Returns a String representation of collection. |
|
static String |
toString(double[][] array,
String separator)
Returns a String representation of array. |
|
static String |
toString(double[] array,
String separator)
Returns a String representation of array. |
|
static String |
toString(float[] array,
String separator)
Returns a String representation of array. |
|
static String |
toString(int[] array,
String separator)
Returns a String representation of array. |
|
static String |
toString(long[] array,
String separator)
Returns a String representation of array. |
|
static String |
toString(Map<?,?> map,
String separator)
Returns a String representation of map. |
|
static String |
toString(short[] array,
String separator)
Returns a String representation of array. |
|
static
|
toString(T[] array,
String separator)
Returns a String representation of array. |
|
static String |
toStringAscii(String s)
Converts s into an equivalent String that consists solely of US-ASCII chars. |
|
static String |
toStringLiteral(String s)
Converts a String into a series of chars that constitute a Java String literal. |
|
private static void |
wsTokensize(String source,
List<String> tokens)
|
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final String newline
System.getProperty
("line.separator")
.
See this wikipedia entry for more discussion.
private static final Pattern lineTerminatorPattern
private static final Pattern spaceTabCommaPattern
private static final boolean stringContructorTrimsGarbage
private static final int numCharsContext
Constructor Detail |
---|
private StringUtil()
Method Detail |
---|
private static boolean inspectStringConstructor()
public static String newString(String s)
This minimal memory guarantee can be crucial in many situations. For example, consider the case of a String which was parsed as a substring from a long line of text from some file. Then a reference to all the chars of the original line of text must be maintained because of how the substring method operates, and so excess memory is used.
public static char[] bytesToChars(byte[] bytes) throws IllegalArgumentException, MalformedInputException, UnmappableCharacterException, CharacterCodingException
The conversion is done using a new CharsetDecoder
created from this platform's default Charset
.
Strict conversion mode is used
(i.e. all errors result in Exceptions; no char substitutions or other silent error handling is performed).
So, bytes must be perfectly encoded using this platform's default encoding.
IllegalArgumentException
- if bytes is null
MalformedInputException
- if an illegal byte sequence for this charset is encountered
UnmappableCharacterException
- if a byte sequence is encountered which cannot be mapped to an equivalent character sequence
CharacterCodingException
- if a decoding problem occurspublic static boolean equals(String s1, String s2)
s1 == null
, and if that is true returns the value of s2 == null
).
Otherwise, if s1 is not null, then it returns s1.equals(s2)
.
public static boolean equalChars(char[] chars, String s)
char[]
has exactly the same chars as a String
.
If one of the args, is null, then the other must be null as well for true to be returned.
public static String ensureSuffix(String s, char suffix) throws IllegalArgumentException
IllegalArgumentException
- if s or suffix is nullpublic static String ensureSuffix(String s, String suffix) throws IllegalArgumentException
IllegalArgumentException
- if s or suffix is nullpublic static String toLength(int number, int length) throws IllegalArgumentException
IllegalArgumentException
- if number is such that it cannot be represented by length digitspublic static String toLength(String s, int length, boolean prepend, char c) throws IllegalArgumentException
IllegalArgumentException
- if s == null; s.length() > lengthpublic static String repeatChars(char c, int length) throws IllegalArgumentException
Contract: the result is never null, but will be empty if length = 0.
length
- the number of chars in the result
IllegalArgumentException
- if length is negativepublic static String getTabs(int numberTabs) throws IllegalArgumentException
repeatChars
('\t', numberTabs)
.
(For top performance, when numberTabs is small, which it usually is, this method returns an appropriate String constant.
Otherwise, it makes a call to repeatChars.)
One use is to create indentation levels when formatting output.
numberTabs
- the number of tab chars in the result
IllegalArgumentException
- if numberTabs is negativepublic static String keepWithinLength(String s, int limitLength) throws IllegalArgumentException
keepWithinLength("abcdefghi", 7)
returns "ab...hi"
.
IllegalArgumentException
- if s == null; limitLength < 5public static String arraysToTextColumns(double[][] arrays, String[] header) throws IllegalArgumentException
Since arrays is a double[][], then arrays[0], arrays[1], etc are double[] subarrays. The result takes each subarray and uses it as a column of text: arrays[0] is the first column, arrays[1] is the second column, etc.
Each row of the result is tab delimited: tab chars separate the numbers from each other. Note that the subarrays need not have equal length: this method will simply leave blanks in cells that have no data.
The header arg is optional (may be null), but if present, must have the same length as the number of subarrays (i.e. arrays.length) because it will be used to label each column.
To illustrate, the following code
double[][] arrays = new double[][] {
new double[] {1},
new double[] {1, 2},
new double[] {1, 2, 3}
};
String[] header = new String[] {"A", "B", "C"};
System.out.println( arraysToTextColumns(arrays, header) );
produces this output:
A B C
1.0 1.0 1.0
2.0 2.0
3.0
IllegalArgumentException
- if arrays == null;
header != null && header.length != arrays.lengthpublic static String[][] toMatrix(CharSequence cs) throws IllegalArgumentException
toMatrix
(cs, lineTerminatorPattern
, spaceTabCommaPattern
)
.
So, this convenience version can parse CharSequences where the rows are either space, tab, or comma delimited.
Warning: s must use spaces, tabs, or commas only as delimiters; these characters cannot appear anywhere else (e.g. inside what the user thinks should be a token). This means that this method cannot parse CharSequences that come from, say, true CSV files (parsing these requires something more complicated; see this webpage).
IllegalArgumentException
public static String[][] toMatrix(CharSequence cs, Pattern rowDelimiter, Pattern columnDelimiter) throws IllegalArgumentException
cs
- the CharSequence to be parsedrowDelimiter
- the regex used to split cs into rowscolumnDelimiter
- the regex used to split each row into column tokens
IllegalArgumentException
- if any arg is nullpublic static boolean isAllAsciiChars(String s) throws IllegalArgumentException
The implementation here scans thru the chars of s
and returns false upon the first char encountered which is not an ASCII value
.
Only if no such char is encountered is true returned.
Note: this algorithm safely handles all
Unicode 4.0
code points,
including all supplementary code points,
which Java's UTF-16 encoding uses a surrogate pair (i.e. two consecutive chars) for.
IllegalArgumentException
- if s is nullpublic static char[] asciiBytesToChars(byte[] bytes) throws IllegalArgumentException
IllegalArgumentException
- if bytes == null; if bytes contains a non-ASCII byte (i.e. a negative value)private static String diagnoseProblem(byte[] bytes, int i)
private static String getAsciiContext(byte[] bytes, int i)
private static String describeAsciiCharsByHammingDistance(byte b)
private static SortedMap<Integer,List<Character>> rankAsciiCharsByHammingDistance(byte b)
public static String toStringAscii(String s) throws IllegalArgumentException
One use for this method is when a String must be printed, but it is possible that the char convertor will fail to represent certain chars.
IllegalArgumentException
- if s == nullpublic static boolean isBlank(String s)
public static boolean isTrimmable(String s)
trim
would return a result that differs from s.
Special cases: if s is either null or zero-length, this method immediately returns false.
public static boolean isNewLineEnd(String s) throws IllegalArgumentException
The newline
constant of this class is the newline char sequence that is used.
IllegalArgumentException
- if s is nullpublic static String ensureEndsInNewLine(String s) throws IllegalArgumentException
The newline
constant of this class is the newline char sequence that is used.
IllegalArgumentException
- if s is nullpublic static String normalizeWhitespace(String s) throws IllegalArgumentException
IllegalArgumentException
- if s is nullpublic static String indentLines(String s) throws IllegalArgumentException
indentLines
(s, 1)
.
IllegalArgumentException
- if s is nullpublic static String indentLines(String s, int numberTabs) throws IllegalArgumentException
parseLines
(s, true)
.
Then returns a new String which consists of the concatenation of every line
after each line is first preceded with an indent equal to the number of tab chars specified by numberTabs.
The result is never empty. It contains at least numberTabs tab chars. It may or may not end with end of line sequence char(s), depending on whether or not s does.
s
- the String to split into lines and indentnumberTabs
- the number of tab chars to use in each line's indent
IllegalArgumentException
- if s is null; numberTabs is negativepublic static String[] parseLines(String s) throws IllegalArgumentException
parseLines
(s, false)
.
IllegalArgumentException
- if s is nullpublic static String[] parseLines(String s, boolean includeEol) throws IllegalArgumentException
The result is never empty.
It contains exactly one element if s has no end of line sequences,
which includes the special case that s is zero-length (i.e. "").
If includeEol == true, then the concatenating every element in the result
(e.g. by calling toString
(lines, "")) reconstitutes s exactly.
s
- the String to split into linesincludeEol
- specifies whether or not to include the end of line sequence char(s) in the lines
IllegalArgumentException
- if s is nullprivate static List<String> initList(String s)
private static boolean nextCharNewline(String s, int i)
public static List<String> splitByLiteral(String s, String delimiter, int n, boolean nIsExact) throws IllegalArgumentException
Contract: the result is never empty
.
This includes the special case that s is the empty String "", in which case the result has a single element that is "".
Furthermore, if nIsExact is true, then it is guaranteed to have exactly n elements.
The result's type always implements RandomAccess
, so its List.get(int)
method will be about as fast as an array's access.
Finally, the result should always be equivalent to calling String.split
(delimiter, -1),
assuming that delimiter contains no special chars so that it too would be treated literally by the regex,
and that the String[] returned by String.split is compared element by element with the List returned by this method.
The reason why this method was written
is because its literal treatment of delimiter allows a more optimized parsing algorithm to be used.
(It is 3-4 times faster than String.split, and 2+ times faster than Pattern.split, results depending on the String to be split;
see UnitTest.benchmark_splitByLiteral
for details.)
s
- the String to split into tokensdelimiter
- the token delimiting charsn
- the expected number of tokensnIsExact
- if true, then s must contain exactly n tokens; if false, n is merely a hint that the implementation can use for optimization
IllegalArgumentException
- if s == null; delimiter is null or zero-length; n < 1; nIsExact is true and s fails to split into exactly n tokenspublic static List<String> splitByChar(String s, char delimiter, int n, boolean nIsExact) throws IllegalArgumentException
splitByLiteral
except that the token delimiter is restricted to a single char,
which allows an even more optimized algorithm to be used.
(It is 1.5-2 times faster than splitByLiteral; see UnitTest.benchmark_splitByChar
for details.)
s
- the String to split into tokensdelimiter
- the token delimiting charn
- the expected number of tokensnIsExact
- if true, then s must contain exactly n tokens; if false, n is merely a hint that the implementation can use for optimization
IllegalArgumentException
- if s == null; n < 1; nIsExact is true and s fails to split into exactly n tokenspublic static String[] quoteWhitespaceTokenize(String source, boolean includeQuotes) throws IllegalArgumentException
The procedure followed is to find the next occurring double quote char. The substring of all untokenized chars before that double quote char is then tokenized using whitespace chars as delimiters. Then, the next token is the substring which consists of all chars from the current double quote till the next double quote. This procedure is repeated until the source String is exhausted.
It is an error if the source String has an odd number of double quote chars (i.e. they must occur in pairs).
An example application of this method is to parse command lines. Here, command line arguments are normally separated by spaces. However, double quotes are used to enclose those args which have spaces inside them.
source
- the String to be tokenizedincludeQuotes
- specifies whether or not to include double quote marks with those tokens that are delimted by them
IllegalArgumentException
- if source is null or if it contains an odd number of double quote charsprivate static void wsTokensize(String source, List<String> tokens)
public static String removeQuotes(String s, int lineNumber) throws IllegalArgumentException, ParseException
Any leading and trailing quote marks must be either both single or both double quote chars in order to match.
lineNumber
- the line number where the String was found; used only if throw a ParseException
IllegalArgumentException
- if s is null, or lineNumber < 0
ParseException
- if a leading but no matching trailing quote mark is present, or vice versapublic static String toString(boolean[] array, String separator) throws IllegalArgumentException
The result consists of the elements of array in sequence, with separator between each element.
IllegalArgumentException
- if array == null; separator == nullpublic static String toString(byte[] array, String separator) throws IllegalArgumentException
The result consists of the elements of array in sequence, with separator between each element.
IllegalArgumentException
- if array == null; separator == nullpublic static String toString(char[] array, String separator) throws IllegalArgumentException
The result consists of the elements of array in sequence, with separator between each element.
IllegalArgumentException
- if array == null; separator == nullpublic static String toString(double[] array, String separator) throws IllegalArgumentException
The result consists of the elements of array in sequence, with separator between each element.
IllegalArgumentException
- if array == null; separator == nullpublic static String toString(double[][] array, String separator) throws IllegalArgumentException
The result consists of the elements of each array[i] on their own line, with these elements in sequence, with separator between each element.
IllegalArgumentException
- if array == null; separator == nullpublic static String toString(float[] array, String separator) throws IllegalArgumentException
The result consists of the elements of array in sequence, with separator between each element.
IllegalArgumentException
- if array == null; separator == nullpublic static String toString(long[] array, String separator) throws IllegalArgumentException
The result consists of the elements of array in sequence, with separator between each element.
IllegalArgumentException
- if array == null; separator == nullpublic static String toString(short[] array, String separator) throws IllegalArgumentException
The result consists of the elements of array in sequence, with separator between each element.
IllegalArgumentException
- if array == null; separator == nullpublic static String toString(int[] array, String separator) throws IllegalArgumentException
The result consists of the elements of array in sequence, with separator between each element.
IllegalArgumentException
- if array == null; separator == nullpublic static <T> String toString(T[] array, String separator) throws IllegalArgumentException
The result consists of the elements of array in sequence, with separator between each element. If any element == null, then it is represented by the text "null".
IllegalArgumentException
- if array == null; separator == nullpublic static String toString(Collection<?> collection, String separator) throws IllegalArgumentException
The result consists of the elements of collection as returned by its Iterator, with separator between each element. If any element == null, then it is represented by the text "null".
IllegalArgumentException
- if collection == null; separator == nullpublic static String toString(Map<?,?> map, String separator) throws IllegalArgumentException
The result consists of map's key/value pairs, with separator between each pair. The keys are obtained by map's keySet Iterator, and the text " --> " appears between the key and its value. If any key or value == null, then it is represented by the text "null".
IllegalArgumentException
- if map == null; separator == nullpublic static String toStringLiteral(String s) throws IllegalArgumentException
One use is that you could take the result and directly paste it into a Java source file. Another use for this method is to handle filepaths with spaces and '\' chars (which commonly occur in DOS).
IllegalArgumentException
- if s == null
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |