JPA 에서 Where In 절 대신 array_contains를 사용하는 이유

JPA

JPA 에서 Where In 절 대신 array_contains를 사용하는 이유

수수한개발자 2023. 10. 17. 12:31

728x90

글을 작성한 이유

통합 테스트를 작성하던중 만난 지금까지 와는 다른 SQL 구문에 대해서 궁금점이 생겨서 글을 작성하게 되었습니다.

https://github.com/jisu3316/Board-Project

GitHub - jisu3316/Board-Project

Contribute to jisu3316/Board-Project development by creating an account on GitHub.

github.com

도메인 구조는 Post : Like = 1 : N 으로 게시글과 좋아요는 1 대 N으로 정의되어있습니다.

@Entity
@Where(clause = "deleted = 'N'")
@DynamicInsert
@DynamicUpdate
public class Post extends BaseEntity {

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;

    @ManyToOne(fetch = FetchType.LAZY)
    @JoinColumn(name = "member_id")
    private Member member;

    private String title;

    @Lob
    private String content;


    @Embedded
    private Likes likes = new Likes();
    
    ...
}

@Embeddable
public class Likes {

    @OneToMany(mappedBy = "post", cascade = {CascadeType.PERSIST, CascadeType.MERGE}, orphanRemoval = true)
    private List<Like> likes = new ArrayList<>();

    public Likes() {
    }
}

public class Like extends BaseEntity {

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;

    @ManyToOne(fetch = FetchType.LAZY)
    @JoinColumn(name = "member_id")
    private Member member;

    @ManyToOne(fetch = FetchType.LAZY)
    @JoinColumn(name = "post_id")
    private Post post;
}

게시글 하나에 대한 조회시 게시글에 연관된 좋아요도 함께 조회해오는 로직이 있습니다.

@Transactional(readOnly = true)
public PostDto findPost(Long id) {
    return postRepository.findById(id).map(PostDto::from)
            .orElseThrow(() -> new PostException(ErrorType.POST_NOT_FOUND, String.format("%s, 게시글이 존재하지 않습니다.", id)));
}

위와 같은 상황에서 테스트를 작성하고 조회해오는데 다음과 같은 로그가 찍혔습니다.

2023-10-17T11:34:44.106+09:00 TRACE 8344 --- [o-auto-1-exec-4] org.hibernate.orm.jdbc.bind: binding parameter [1] as [BIGINT] - [1]
2023-10-17T11:34:44.110+09:00 DEBUG 8344 --- [o-auto-1-exec-4] org.hibernate.SQL : 
    select
        l1_0.post_id,
        l1_0.id,
        l1_0.created_at,
        l1_0.deleted,
        l1_0.member_id,
        l1_0.updated_at 
    from
        likes l1_0 
    where
        array_contains(?,l1_0.post_id)
2023-10-17T11:34:44.110+09:00 TRACE 8344 --- [o-auto-1-exec-4] org.hibernate.orm.jdbc.bind: binding parameter [1] as [ARRAY] - [[1, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null]]

생각한 SQL 구문은 select * from likes where in (?) 입니다.

그 이유는 도메인 특성상 1 대 N 이고 서비스계층에서 포스트에 연관된 좋아요의 개수를 함께 반환하기 때문에 in절을 생각하고 코드를 작성했습니다.

그리고 뭔가 잘못된거 같은 바인딩 로그... null, null, null,null.... 개발자는 null이 무섭다...

저의 문제가 무엇일까 하고 코드를 살펴봤지만 그냥 남들과 다르지 않은 1대N 관계에서 조회해오는 로직이였습니다.

이유를 찾아보자

제 프로젝트의 환경은 스프링부트 3.1.3을 사용하였습니다.

스프링부트와 starter 디펜던시를 사용하면 스프링 버전과 맞는 의존성을 추가해주는 아주 편리한 기능이 있습니다.

그렇기 때문에 hibernate도 6.2 버전대로 가져오게 되었네요.

여기서 그럼 이것이 SQL 구문을 왜 바꾸느냐... 검색을 해보던중 김영한 선생님이 또 저와 같은 고민을 하신분께 해결을 해주셨더라구요..

하이버네이트에서는 최적화를 위해 이렇게 사용한다고 합니다.

SQL구문은 파싱된 결과를 내부에서 캐싱하고 있고 이렇게 해두면 같은 모양의 SQL 구문이 이후에 실행되어도 그대로 사용하여 성능을 최적화 할 수 있다고 합니다.

참고로 여기서 말하는 캐싱은 SQL 구문 자체를 캐싱한다는 뜻이지 SQL의 실행 결과를 캐싱한다는 뜻이 아니라고합니다.

예를 들어 보겠습니다.

in 절을 사용했을때

select * from likes where in (?)
select * from likes where in (?, ?)
select * from likes where in (?, ?, ?)

바인딩 된 파라미터가 3개일 때 in 절은 숫자 자체가 달라지기때문에 SQL 구문이 달라지게 됩니다.

그렇기 때문에 실행도 3개 캐싱도 3개를 해야되기 때문에 이러한 성능 문제를 해결하기 위함으로 보입니다.

array_contains을 사용했을때

select * from likes where array_contains(?,l1_0.post_id)
select * from likes where array_contains([1],l1_0.post_id)
select * from likes where array_contains([1,2],l1_0.post_id)
select * from likes where array_contains([1,2,3],l1_0.post_id)

위와 같이 구문이 만들어지게 됩니다.

array_contains 에서는 ? 파라미터에 배열이 들어가기 때문에 [?], [?, ?], [?, ?, ?] 숫자는 달라지지만 배열로 매번 같은 SQL 구문이 실행 되게 됩니다.

결과적으로 동적으로 데이터가 늘어도 같은 SQL 구문으로 그대로 사용하기 때문에 성능 이점을 볼 수 있습니다.

그러면 위에서 바인딩 파라미터 로그에 null은 뭘까 찾아보니 default_batch_fetch_size에 따라 배열에 null값을 추가 한다고 합니다.

728x90