2017년 3월 18일 토요일

자바에서 문자열 분리 파헤치기 String split in java


Java에서 String을 분리하는 메소드는 split 입니다.
아래 설명을 참고 하면됩니다
  • split

    public String[] split(String regex)
    Splits this string around matches of the given regular expression. This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.
    The string "boo:and:foo", for example, yields the following results with these expressions:
    RegexResult
    :{ "boo", "and", "foo" }
    o{ "b", "", ":and:f" }
    Parameters:
    regex - the delimiting regular expression
    Returns:
    the array of strings computed by splitting this string around matches of the given regular expression
    Throws:
    PatternSyntaxException - if the regular expression's syntax is invalid
    Since:
    1.4
    See Also:
    Pattern

인자로는 정규식을 넣어야 하는데요.
그냥 분리하려고하는 문자열을 넣고 주로 사용하다보니까 그것을 별로 신경안쓰게되는데요. 정규식에 대해 잘모르다보니 문제가 발생하는 경우가 있어서 정리를 해보았습니다.

Sample 소스

package testProject;

public class Test {

 public static void main(String[] args) {
  String data = "abcdefghijklmnopqrstuvwxyz abcdefghijklmnopqrstuvwxyz .~!@#$%^&*()_+|`1234567890-=\\[];',./{}:\"<>?.~!@#$%^&*()_+|`1234567890-=\\[];',./{}:\"<>?";
  System.out.println("String:"+data);
  printArray("b",data.split("b"));
  printArray(".",data.split("."));
  printArray(",",data.split(","));
  printArray("[.]",data.split("[.]"));
  printArray("[ .]",data.split("[ .]"));
  for(int i=0;i<data.length();i++){
   try{
    printArray(String.valueOf(data.charAt(i)),data.split(String.valueOf((data.charAt(i)))));
   }catch(Exception e) {
    e.printStackTrace();
   }
  }
 }

 private static void printArray(String s,String[] data) {
  System.out.println("split data:"+s);
  if( data.length == 0 ){
   System.out.println("result : null");
  }
  for(int i=0;i<data.length;i++){
   System.out.println("result["+i+"]:"+data[i]);
  }
 }

}



결과
String:abcdefghijklmnopqrstuvwxyz .~!@#$%^&*()_+|`1234567890-=\[];',./{}:"<>?
split data:b
result[0]:a
result[1]:cdefghijklmnopqrstuvwxyz .~!@#$%^&*()_+|`1234567890-=\[];',./{}:"<>?
split data:.
result : null
split data:,
result[0]:abcdefghijklmnopqrstuvwxyz .~!@#$%^&*()_+|`1234567890-=\[];'
result[1]:./{}:"<>?
split data:[.]
result[0]:abcdefghijklmnopqrstuvwxyz 
result[1]:~!@#$%^&*()_+|`1234567890-=\[];',
result[2]:/{}:"<>?
split data:[ .]
result[0]:abcdefghijklmnopqrstuvwxyz
result[1]:
result[2]:~!@#$%^&*()_+|`1234567890-=\[];',
result[3]:/{}:"<>?
split data:a
result[0]:
result[1]:bcdefghijklmnopqrstuvwxyz .~!@#$%^&*()_+|`1234567890-=\[];',./{}:"<>?
split data:b
result[0]:a
result[1]:cdefghijklmnopqrstuvwxyz .~!@#$%^&*()_+|`1234567890-=\[];',./{}:"<>?
split data:c
result[0]:ab
result[1]:defghijklmnopqrstuvwxyz .~!@#$%^&*()_+|`1234567890-=\[];',./{}:"<>?
split data:d
result[0]:abc
result[1]:efghijklmnopqrstuvwxyz .~!@#$%^&*()_+|`1234567890-=\[];',./{}:"<>?
split data:e
result[0]:abcd
result[1]:fghijklmnopqrstuvwxyz .~!@#$%^&*()_+|`1234567890-=\[];',./{}:"<>?
split data:f
result[0]:abcde
result[1]:ghijklmnopqrstuvwxyz .~!@#$%^&*()_+|`1234567890-=\[];',./{}:"<>?
split data:g
result[0]:abcdef
result[1]:hijklmnopqrstuvwxyz .~!@#$%^&*()_+|`1234567890-=\[];',./{}:"<>?
split data:h
result[0]:abcdefg
result[1]:ijklmnopqrstuvwxyz .~!@#$%^&*()_+|`1234567890-=\[];',./{}:"<>?
split data:i
result[0]:abcdefgh
result[1]:jklmnopqrstuvwxyz .~!@#$%^&*()_+|`1234567890-=\[];',./{}:"<>?
split data:j
result[0]:abcdefghi
result[1]:klmnopqrstuvwxyz .~!@#$%^&*()_+|`1234567890-=\[];',./{}:"<>?
split data:k
result[0]:abcdefghij
result[1]:lmnopqrstuvwxyz .~!@#$%^&*()_+|`1234567890-=\[];',./{}:"<>?
split data:l
result[0]:abcdefghijk
result[1]:mnopqrstuvwxyz .~!@#$%^&*()_+|`1234567890-=\[];',./{}:"<>?
split data:m
result[0]:abcdefghijkl
result[1]:nopqrstuvwxyz .~!@#$%^&*()_+|`1234567890-=\[];',./{}:"<>?
split data:n
result[0]:abcdefghijklm
result[1]:opqrstuvwxyz .~!@#$%^&*()_+|`1234567890-=\[];',./{}:"<>?
split data:o
result[0]:abcdefghijklmn
result[1]:pqrstuvwxyz .~!@#$%^&*()_+|`1234567890-=\[];',./{}:"<>?
split data:p
result[0]:abcdefghijklmno
result[1]:qrstuvwxyz .~!@#$%^&*()_+|`1234567890-=\[];',./{}:"<>?
split data:q
result[0]:abcdefghijklmnop
result[1]:rstuvwxyz .~!@#$%^&*()_+|`1234567890-=\[];',./{}:"<>?
split data:r
result[0]:abcdefghijklmnopq
result[1]:stuvwxyz .~!@#$%^&*()_+|`1234567890-=\[];',./{}:"<>?
split data:s
result[0]:abcdefghijklmnopqr
result[1]:tuvwxyz .~!@#$%^&*()_+|`1234567890-=\[];',./{}:"<>?
split data:t
result[0]:abcdefghijklmnopqrs
result[1]:uvwxyz .~!@#$%^&*()_+|`1234567890-=\[];',./{}:"<>?
split data:u
result[0]:abcdefghijklmnopqrst
result[1]:vwxyz .~!@#$%^&*()_+|`1234567890-=\[];',./{}:"<>?
split data:v
result[0]:abcdefghijklmnopqrstu
result[1]:wxyz .~!@#$%^&*()_+|`1234567890-=\[];',./{}:"<>?
split data:w
result[0]:abcdefghijklmnopqrstuv
result[1]:xyz .~!@#$%^&*()_+|`1234567890-=\[];',./{}:"<>?
split data:x
result[0]:abcdefghijklmnopqrstuvw
result[1]:yz .~!@#$%^&*()_+|`1234567890-=\[];',./{}:"<>?
split data:y
result[0]:abcdefghijklmnopqrstuvwx
result[1]:z .~!@#$%^&*()_+|`1234567890-=\[];',./{}:"<>?
split data:z
result[0]:abcdefghijklmnopqrstuvwxy
result[1]: .~!@#$%^&*()_+|`1234567890-=\[];',./{}:"<>?
split data: 
result[0]:abcdefghijklmnopqrstuvwxyz
result[1]:.~!@#$%^&*()_+|`1234567890-=\[];',./{}:"<>?
split data:.
result : null
split data:~
result[0]:abcdefghijklmnopqrstuvwxyz .
result[1]:!@#$%^&*()_+|`1234567890-=\[];',./{}:"<>?
split data:!
result[0]:abcdefghijklmnopqrstuvwxyz .~
result[1]:@#$%^&*()_+|`1234567890-=\[];',./{}:"<>?
split data:@
result[0]:abcdefghijklmnopqrstuvwxyz .~!
result[1]:#$%^&*()_+|`1234567890-=\[];',./{}:"<>?
split data:#
result[0]:abcdefghijklmnopqrstuvwxyz .~!@
result[1]:$%^&*()_+|`1234567890-=\[];',./{}:"<>?
split data:$
result[0]:abcdefghijklmnopqrstuvwxyz .~!@#$%^&*()_+|`1234567890-=\[];',./{}:"<>?
split data:%
result[0]:abcdefghijklmnopqrstuvwxyz .~!@#$
result[1]:^&*()_+|`1234567890-=\[];',./{}:"<>?
split data:^
result[0]:abcdefghijklmnopqrstuvwxyz .~!@#$%^&*()_+|`1234567890-=\[];',./{}:"<>?
split data:&
result[0]:abcdefghijklmnopqrstuvwxyz .~!@#$%^
result[1]:*()_+|`1234567890-=\[];',./{}:"<>?
Exception : at char:*
Exception : at char:(
Exception : at char:)
split data:_
result[0]:abcdefghijklmnopqrstuvwxyz .~!@#$%^&*()
result[1]:+|`1234567890-=\[];',./{}:"<>?
Exception : at char:+
split data:|
result[0]:a
result[1]:b
result[2]:c
result[3]:d
result[4]:e
result[5]:f
result[6]:g
result[7]:h
result[8]:i
result[9]:j
result[10]:k
result[11]:l
result[12]:m
result[13]:n
result[14]:o
result[15]:p
result[16]:q
result[17]:r
result[18]:s
result[19]:t
result[20]:u
result[21]:v
result[22]:w
result[23]:x
result[24]:y
result[25]:z
result[26]: 
result[27]:.
result[28]:~
result[29]:!
result[30]:@
result[31]:#
result[32]:$
result[33]:%
result[34]:^
result[35]:&
result[36]:*
result[37]:(
result[38]:)
result[39]:_
result[40]:+
result[41]:|
result[42]:`
result[43]:1
result[44]:2
result[45]:3
result[46]:4
result[47]:5
result[48]:6
result[49]:7
result[50]:8
result[51]:9
result[52]:0
result[53]:-
result[54]:=
result[55]:\
result[56]:[
result[57]:]
result[58]:;
result[59]:'
result[60]:,
result[61]:.
result[62]:/
result[63]:{
result[64]:}
result[65]::
result[66]:"
result[67]:<
result[68]:>
result[69]:?
split data:`
result[0]:abcdefghijklmnopqrstuvwxyz .~!@#$%^&*()_+|
result[1]:1234567890-=\[];',./{}:"<>?
split data:1
result[0]:abcdefghijklmnopqrstuvwxyz .~!@#$%^&*()_+|`
result[1]:234567890-=\[];',./{}:"<>?
split data:2
result[0]:abcdefghijklmnopqrstuvwxyz .~!@#$%^&*()_+|`1
result[1]:34567890-=\[];',./{}:"<>?
split data:3
result[0]:abcdefghijklmnopqrstuvwxyz .~!@#$%^&*()_+|`12
result[1]:4567890-=\[];',./{}:"<>?
split data:4
result[0]:abcdefghijklmnopqrstuvwxyz .~!@#$%^&*()_+|`123
result[1]:567890-=\[];',./{}:"<>?
split data:5
result[0]:abcdefghijklmnopqrstuvwxyz .~!@#$%^&*()_+|`1234
result[1]:67890-=\[];',./{}:"<>?
split data:6
result[0]:abcdefghijklmnopqrstuvwxyz .~!@#$%^&*()_+|`12345
result[1]:7890-=\[];',./{}:"<>?
split data:7
result[0]:abcdefghijklmnopqrstuvwxyz .~!@#$%^&*()_+|`123456
result[1]:890-=\[];',./{}:"<>?
split data:8
result[0]:abcdefghijklmnopqrstuvwxyz .~!@#$%^&*()_+|`1234567
result[1]:90-=\[];',./{}:"<>?
split data:9
result[0]:abcdefghijklmnopqrstuvwxyz .~!@#$%^&*()_+|`12345678
result[1]:0-=\[];',./{}:"<>?
split data:0
result[0]:abcdefghijklmnopqrstuvwxyz .~!@#$%^&*()_+|`123456789
result[1]:-=\[];',./{}:"<>?
split data:-
result[0]:abcdefghijklmnopqrstuvwxyz .~!@#$%^&*()_+|`1234567890
result[1]:=\[];',./{}:"<>?
split data:=
result[0]:abcdefghijklmnopqrstuvwxyz .~!@#$%^&*()_+|`1234567890-
result[1]:\[];',./{}:"<>?
Exception : at char:\
Exception : at char:[
split data:]
result[0]:abcdefghijklmnopqrstuvwxyz .~!@#$%^&*()_+|`1234567890-=\[
result[1]:;',./{}:"<>?
split data:;
result[0]:abcdefghijklmnopqrstuvwxyz .~!@#$%^&*()_+|`1234567890-=\[]
result[1]:',./{}:"<>?
split data:'
result[0]:abcdefghijklmnopqrstuvwxyz .~!@#$%^&*()_+|`1234567890-=\[];
result[1]:,./{}:"<>?
split data:,
result[0]:abcdefghijklmnopqrstuvwxyz .~!@#$%^&*()_+|`1234567890-=\[];'
result[1]:./{}:"<>?
split data:.
result : null
split data:/
result[0]:abcdefghijklmnopqrstuvwxyz .~!@#$%^&*()_+|`1234567890-=\[];',.
result[1]:{}:"<>?
Exception : at char:{
split data:}
result[0]:abcdefghijklmnopqrstuvwxyz .~!@#$%^&*()_+|`1234567890-=\[];',./{
result[1]::"<>?
split data::
result[0]:abcdefghijklmnopqrstuvwxyz .~!@#$%^&*()_+|`1234567890-=\[];',./{}
result[1]:"<>?
split data:"
result[0]:abcdefghijklmnopqrstuvwxyz .~!@#$%^&*()_+|`1234567890-=\[];',./{}:
result[1]:<>?
split data:<
result[0]:abcdefghijklmnopqrstuvwxyz .~!@#$%^&*()_+|`1234567890-=\[];',./{}:"
result[1]:>?
split data:>
result[0]:abcdefghijklmnopqrstuvwxyz .~!@#$%^&*()_+|`1234567890-=\[];',./{}:"<
result[1]:?
Exception : at char:?


주의점

String:abcdefghijklmnopqrstuvwxyz .~!@#$%^&*()_+|`1234567890-=\[];',./{}:"<>?
split data:a
result[0]:  <= 제일 앞에 시작하는 문자를 split 하면 empty가 들어온다.
result[1]:bcdefghijklmnopqrstuvwxyz .~!@#$%^&*()_+|`1234567890-=\[];',./{}:"<>?


정리

주어진 문자열

String:abcdefghijklmnopqrstuvwxyz .~!@#$%^&*()_+|`1234567890-=\[];',./{}:"<>?

예상과 다른 결과

$ , ^ , . , |
split data:$
result[0]:abcdefghijklmnopqrstuvwxyz .~!@#$%^&*()_+|`1234567890-=\[];',./{}:"<>?

split data:^
result[0]:abcdefghijklmnopqrstuvwxyz .~!@#$%^&*()_+|`1234567890-=\[];',./{}:"<>?

split data:.
result : null

split data:|
result[0]:a
result[1]:b
result[2]:c
result[3]:d
result[4]:e
result[5]:f
result[6]:g
result[7]:h
result[8]:i
result[9]:j
result[10]:k
result[11]:l
result[12]:m
result[13]:n
result[14]:o
result[15]:p
result[16]:q
result[17]:r
result[18]:s
result[19]:t
result[20]:u
result[21]:v
result[22]:w
result[23]:x
result[24]:y
result[25]:z
result[26]:
result[27]:.
result[28]:~
result[29]:!
result[30]:@
result[31]:#
result[32]:$
result[33]:%
result[34]:^
result[35]:&
result[36]:*
result[37]:(
result[38]:)
result[39]:_
result[40]:+
result[41]:|
result[42]:`
result[43]:1
result[44]:2
result[45]:3
result[46]:4
result[47]:5
result[48]:6
result[49]:7
result[50]:8
result[51]:9
result[52]:0
result[53]:-
result[54]:=
result[55]:\
result[56]:[
result[57]:]
result[58]:;
result[59]:'
result[60]:,
result[61]:.
result[62]:/
result[63]:{
result[64]:}
result[65]::
result[66]:"
result[67]:<
result[68]:>
result[69]:?

사용 못하는 문자

Exception : at char:*
Exception : at char:(
Exception : at char:)
Exception : at char:+
Exception : at char:\
Exception : at char:[
Exception : at char:{
Exception : at char:?


위에서 사용못하는 문제는 정규식에서 사용하기 때문입니다.


간단한게 사용하는 예

String : "boo:and:foo"
RegexResult
:{ "boo", "and", "foo" }
o{ "b", "", ":and:f" }

정규식의 자세한 내용은 아래 링크 참고 바랍니다.

https://ko.wikipedia.org/wiki/%EC%A0%95%EA%B7%9C_%ED%91%9C%ED%98%84%EC%8B%9D

인자로 문자열 여러개를 사용할 수 있습니다.

String:abcdefghijklmnopqrstuvwxyz .~!@#$%^&*()_+|`1234567890-=\[];',./{}:"<>?
split data:bc
result[0]:a
result[1]:defghijklmnopqrstuvwxyz .~!@#$%^&*()_+|`1234567890-=\[];',./{}:"<>?

인자로 문자열 여러개는 연속 문자열을 의미합니다.

String:abcdefghijklmnopqrstuvwxyz .~!@#$%^&*()_+|`1234567890-=\[];',./{}:"<>?
split data:fgi
reslut[0]:abcdefghijklmnopqrstuvwxyz .~!@#$%^&*()_+|`1234567890-=\[];',./{}:"<>?

문자 하나는 [ ] 의 내용으로 표현이 가능합니다.

즉 f 또는 g 또는 i 중 하나를 분리하려면 [fgi] 로 표현합니다.

String:abcdefghijklmnopqrstuvwxyz .~!@#$%^&*()_+|`1234567890-=\[];',./{}:"<>?
split data:[fgi]
reslut[0]:abcde
reslut[1]:        <= fg가 붙어있으면 "" 이 되는 경우가 있습니다.
reslut[2]:h
reslut[3]:jklmnopqrstuvwxyz .~!@#$%^&*()_+|`1234567890-=\[];',./{}:"<>?

사용 못하는 문자의 경우 \ 를 넣어서 분리합니다.

String:abcdefghijklmnopqrstuvwxyz .~!@#$%^&*()_+|`1234567890-=\[];',./{}:"<>?
split data:[\*\\\?\+] <= java에서는 다음과 같이 표현해야 합니다. split("
reslut[0]:abcdefghijklmnopqrstuvwxyz .~!@#$%^&
reslut[1]:()_
reslut[2]:|`1234567890-=
reslut[3]:[];',./{}:"<>



예상과 다른 결과 + 사용 못하는 문자

$ , ^ , . , | , * , ( , ) , + , \ , [ , { , ?

Java에서 사용하는 방법

[\\문자]
예) $ 또는 | 로 분리되는 문자를 구하고 싶을때  [\\$\\|]



댓글 없음:

댓글 쓰기