• 多文本键值关联


    【问题】
    I want to match column 1 of file 1 with column 1 of file 2 and then column 2 of file 1 with column 1 of file 3 and then print the matches. The columns in the files are separated by tabs. For example:

    file 1:

    fji01dde AIDJFMGKG
    dlp02sle VMCFIJGM
    cmr03lsp CKEIFJ
    

    file 2:

    fji01dde 25  30
    dlp02sle 40  50
    cmr03lsp 60  70

    file 3:

    AIDJFMGKG
    CKEIFJ
    

    output needs to be:

    fji01dde AIDJFMGKG 25  30
    cmr03lsp CKEIFJ 60  70
    

    I only want lines that are common in all three files.

    The below code works well for the first two files, but I need to incorporate the third file. Any ideas?

    #!/usr/bin/env perl
    use strict;
    my (%file1,%file2);
    ## Open the 1st file
    open(A,"file1");
    while(<A>){
     chomp;
     ## Split the current line on tabs into the @F array.
     my  @F=split(/\t/);
     push @{$file1{$F[0]}},@F[1..$#F];
    }
    ## Open the 2nd file
    open(B,"file2");
    while(<B>){
     chomp;
     ## Split the current line on tabs into the @F array.
     my  @F=split(/\t/);
     if (defined($file1{$F[0]})) {
     foreach  my $col (@{$file1{$F[0]}}) {
     print  "$F[0]\t$col\t@F[1..$#F]\n";
     }
     }
    }
    

    What would the output be if file3 did not have CKEIFJ

    It would be just fji01dde AIDJFMGKG 25 30

    别人给出的解答:

    use strict;
    use warnings;
    use  Text::CSV_XS qw(csv);
    my  @csv_files = @ARGV;
    # Parse all the CSV files into arrays of arrays.
    my $data1 = csv( in => $csv_files[0], sep_char => "\t" );
    # Parse the other CSV files into hashes of rows keyed on the columns we're going to search on.
    my $data2 = csv( in => $csv_files[1],
    sep_char => "\t",
     headers => ["code", "num1", "num2"\],
    key => "code"
     );
    my $data3 = csv( in => $csv_files[2],
    sep_char => "\t",
     headers => \["CODE"],
    key => "CODE"
     );
    for  my $row1 (@$data1) {
     my $row2 = $data2->{$row1->[0]};
     my $row3 = $data3->{$row1->[1]};
     if( $row2 && $row3 ) {
     print join "\t", $row1->[0], $row1->[1], $row2->{num1}, $row2->{num2};
     print  "\n";
     }
    }
    

    【回答】
    这是典型的结构化计算,file1 的 1 号字段和 file2 的 1 号字段存在左关联,file3 的 1 号字段和 file1 的 2 号字段存在左关联,用支持结构化算法的语言很容易描述并解答问题,Perl 缺乏相关函数,虽能实现但难以理解。

    可以选用集算器 SPL 实现上述算法,代码简单易懂:

    A
    1=file("D:/file1.txt").import()
    2=file("D:/file2.txt").import()
    3=file("D:/file3.txt").import()
    4=A1.switch(#1,A2:#1)
    5=A3.switch(#1,A1:#2)
    6=A5.new(#1.#1.#1, #1.#2, #1.#1.#2, #1.#1.#3)

    A4-A5:建立表间关联

    A6:按关联路径取数,其中 #1.#1.#1 表示 file3 的 1 号字段,指向 file1 记录里的 1 号字段,再指向 file2 里的 1 号字段。

  • 相关阅读:
    【JavaScript】-- 基本包装类型(含笔试题)
    域控制器BSP开发工程师面试题
    超适合练手的一套JavaWeb项目 (保安管理系统)
    在kernel中使用内存检测
    【毕业设计】深度学习+python+opencv实现动物识别 - 图像识别
    新版Java面试专题视频教程——框架篇
    Apache Doris 整合 FLINK CDC + Iceberg 构建实时湖仓一体的联邦查询
    css3对文字标签不同宽,不同高使用瀑布流对齐显示
    图书推荐||Word文稿之美
    【算法】剑指offer-删除链表中重复的节点&&最小栈
  • 原文地址:https://blog.csdn.net/raqsoft/article/details/125616772