• 去除重复数据


    【问题】

    I am looking for some help. I have an application at work that generates a csv with user information on it. I want to use Java and take the data, delete duplicate information, rearrange it, and create a spreadsheet, to make life easier. The csv is generated in the following format, but much larger:

    1. 21458952, a1234, Doe, John, technology, support staff, work phone, 555-555-5555
    2. 21458952, a1234, Doe, John, technology, support staff, work email, johndoe@whatever.net
    3. 21458952, a1234, Doe, John, technology, support staff, work pager, 555-555-5555
    4. 99946133, b9854, Paul, Jane, technology, administration, work phone, 444-444-4444
    5. 99946133, b9854, Paul, Jane, technology, administration, work email, janepaul@whatever.net
    6. 99946133, b9854, Paul, Jane, technology, administration, work pager, 444-444-4444
    7. 99946133, b9854, Paul, Jane, technology, administration, cell phone, 444-444-4444

    I want to delete the duplicates and arrange the data in appropriate columns.

    ID | PIN | Lname | Fname | Dept | team | work px | work email

    I have been trying to build arrays with a BufferedReader to store the data, but I am running into difficulties dealing with duplicates and manipulating the data into a table.

    This is the code I have so far

    1. public class Sort {
    2. public static void main(String[] args) {
    3. BufferedReader br = null;
    4. try{
    5. String line="";
    6. String csvSplitBy=(",");
    7. String outPut;
    8. br = new BufferedReader(new FileReader("C:/Users/Jason/Desktop/test.txt")); //location where the file is retreived
    9. while ((line = br.readLine()) !=null){ //checks to see if the data is there
    10. String[] id = line.split(csvSplitBy);
    11. outPut = id[0] + "," + id[1] + "," + id[2] + "," + id[3] + "," + id[4] + "," + id[5] + "," + id[6] + "," + id[7]
    12. + "," + id[8] + "," + id[9];//incomplete...using for test...
    13. System.out.println(outPut); //displays the contents of the .txt file
    14. } //ends while statement
    15. } //ends try
    16. catch (IOException e){
    17. System.out.println ("File not found!");
    18. } //ends catch
    19. finally{
    20. try{
    21. if (br !=null)br.close();}
    22. catch(IOException ex){
    23. ex.printStackTrace();
    24. } //ends try
    25. } //ends finally
    26. } //ends main method
    27. } //ends class Sort

    【回答】

    JAVA没有直接实现文本文件分组或求唯一值的类库,自行编码会非常复杂。这种问题建议采用SPL来协助JAVA完成,代码很简单:

    A
    1=file("D:\\dup.csv").import@c()
    2=A1.group(_1,_2,_3,_4,_5,_6;~.select@1(_7=="work phone")._8,~.select@1(_7=="work email")._8)
    3=file("D:\\result.csv").export@c(A2)

    A1:读取文件dup.csv中的内容。

    A2:去除重复数据,并选出需要的数据。

    A3:将A2导出到文件result.csv中。

    SPL脚本可以嵌入JAVA中使用(参考Java 如何调用 SPL 脚本

  • 相关阅读:
    JSP page对象
    css常见问题处理
    图像的IO操作
    X(推特)“鸡贼”手段曝光:这些广告并没有标注,你知道吗?
    Python列表推导式——List
    微信小程序备案内容常见问题汇总
    oracle10g 监听异常处理
    楠姐技术漫话:图计算的那些事 | 京东云技术团队
    React组件应用于Spring MVC工程
    群狼调研开展连锁酒店神秘顾客调查项目
  • 原文地址:https://blog.csdn.net/raqsoft/article/details/126864180