Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
3.1k views
in Technique[技术] by (71.8m points)

remove spaces from string in spark 3

I am trying to extract just numbers from string and separate them by comma in spark sql , so I am using REGEXP_REPLACE function to convert all character to space then use the same function to replace all spaces by comma but it did not work ,

what I try :

        /// first try : 
            val resultDF = hiveContext.sql("select MMS,WAP,EMAIL ,case when MMS=1 and WAP=1 and INTERNET=1 and EMAIL=1 then 'SMS,MMS,WAP' when MMS=1 and WAP=1 then 'SMS,MMS' end as  supported_services,LTE_FREQUENCIES, REGEXP_REPLACE ( REGEXP_REPLACE (LTE_FREQUENCIES, '[A-Z]', ' '), '\s+', ',') from test.tttt where limit 1")
        /// second try : 
            val resultDF = hiveContext.sql("select MMS,WAP,EMAIL,supported_services,REGEXP_REPLACE ( network_filed, '\s+', ',') from (select  MMS,WAP,EMAIL ,case when MMS=1 and WAP=1 and INTERNET=1 and EMAIL=1 then 'SMS,MMS,WAP' when MMS=1 and WAP=1 then 'SMS,MMS' end as  supported_services, REGEXP_REPLACE (LTE_FREQUENCIES, '[A-Z]', ' ') as network_filed from test.tttt limit 1)")
        ///third try : 
            val resultDF = hiveContext.sql("select MMS,WAP,EMAIL,supported_services,REGEXP_REPLACE ( network_filed, '[[:space:]]+', ',') from (select  MMS,WAP,EMAIL ,case when MMS=1 and WAP=1 and INTERNET=1 and EMAIL=1 then 'SMS,MMS,WAP' when MMS=1 and WAP=1 then 'SMS,MMS' end as  supported_services, REGEXP_REPLACE (LTE_FREQUENCIES, '[A-Z]', ' ') as network_filed from test.tttt limit 1)")

sample of input (LTE_FREQUENCIES value): LTE FDD BAND 1 LTE FDD BAND 2 LTE FDD BAND 3 LTE FDD BAND 4 LTE FDD BAND 5 LTE TDD BAND 6 LTE FDD BAND 7 LTE FDD BAND 9 LTE FDD BAND 10 LTE FDD BAND 11 LTE FDD BAND 12 LTE FDD BAND 13 LTE FDD BAND 14 LTE FDD BAND 17 LTE FDD BAND 18 LTE FDD BAND 19 LTE FDD BAND 20 LTE FDD BAND 21 LTE FDD BAND 23 LTE FDD BAND 24 LTE FDD BAND 25 LTE FDD BAND 26 LTE FDD BAND 33 LTE FDD BAND 34 LTE FDD BAND 35 LTE FDD BAND 36 LTE FDD BAND 37 LTE TDD BAND 38 LTE TDD BAND 39 LTE TDD BAND 40 LTE TDD BAND 41 LTE TDD BAND 42 LTE TDD BAND 43 LTE FDD BAND 8 LTE FDD BAND 28 LTE FDD BAND 29 LTE FDD BAND 31 LTE TDD BAND 33 LTE TDD BAND 34 LTE TDD BAND 35 LTE TDD BAND 36 LTE TDD BAND 37 LTE TDD BAND 61 LTE TDD BAND 62 LTE TDD BAND 60

sample of excepted output : 1,2,3,4,5,6,7,9,10,11,12,13,14,17,18,19,20,21,23,24,25,26,33,34,35,36,37,38,39,40,41,42,43,8,28,29,31,33,34,35,36,37,61,62,60

non of above work always the result when I run resultDF.show(20, false) is : result of run

is there are any way to replace those spaces by comma ?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Use four backslashes for regex backslashes in Scala:

val resultDF = hiveContext.sql(" ... REGEXP_REPLACE ( network_filed, '\\s+', ',') ... ")

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...