內容分頁顯示

分頁顯示功能可讓您將大量資料切分為更小的片段以利管理。

簡介

Rapid 內容 API 允許使用大量旅宿資料。由於此資料量龐大,因此內容 API 支援使用分頁顯示,將資料切分為更小的片段以利管理。本文件透過幾個範例與最佳作法,說明分頁顯示功能的使用方式。

基本範例

分頁顯示流程首先會搜尋旅宿,並取得超出單個頁面所能容納的多筆結果。出現這種情況時,回應會先顯示第一頁結果,然後會提供一個 Link 回應標頭,供使用者點選瀏覽下一頁。

請求範例:

https://api.ean.com/v3/properties/content?language=en-US&supply_source=expedia

回應標頭範例:

Link: <https://api.ean.com/v3/properties/content?token=WVZTCUNRUQ4SUSNHAXIWFk0VRQ5JZhFWExRaXAgRVnpDA1RWTUkBB10FHFYGQwZyHwBNFA1HBhIMC1IGAUsGBhkHBGcHBBUGdlMHQAd1UA8WBwwMB1NcBAhdahBWUAdRXjtfVwpEBiEdASBHREpdEVwUQRxuRVgRWg1UaVkHS1kcA3IWBXEVAFZMAz1VBVRWXT5KRQNKFQVEACMXASJFVlRBVzoTRQZVQQRVOUdHVUAVDRRXIBNXJxdYAwtWQFJeVgpHAiYTCwoEWhRmZ0MHCxwFJhNbUEcGU1tCHW1dAWwAGlEIEAFVXEYNIRQBIRcTSltIAUVHTTxdAghAU3VDDSFCVkYCXFE8XgIMQwF7QAAlFwVZEVJpTUdWBBcHU2cBXgEKRgFwFVdxR1tWQQtHQhhuUFgAAA5WE1oKH0JcBEZVDGdVBBdVQl4BVQgFVRIVEFwWBBdHS2xKBU1RDANvDFFfX0cNekZTcxJeE1gQW24XDw8RDEdTIUBTJhFTAxZXb1lUAVNRa1ZZAFxHAXQVUHxDVxdDUAxcFRVmVFpQBlRbFFNxEAwgRXcMXAdfFUZbBFQAXFQGV1YCAVI=>; rel="next"; expires=2023-06-01T17:13:19.699379618Z

在到期時間之前點選提供的連結,會回傳下一頁結果,並且後續頁面會顯示新的 Link 標頭。若要翻閱所有的回應頁面,只需繼續點選回傳的每個 Link 標頭,直到系統不再回傳其他 Link 標頭為止。這表示已抵達請求的資料集合結尾。

篩選請求的資料

上述的簡單範例說明了分頁顯示的運作原理,它同時也是一項非常龐大的搜尋作業。當旅宿數量眾多時,系統可能需要一些時間才能將所有旅宿分頁顯示。加入額外的查詢參數,有助於單純搜尋真正需要的旅宿。

例如,也許旅客只需要美國境內的旅宿,而不是要請求顯示所有的旅宿。只要使用 country_code 物件,將請求變更為加入查詢參數,便能請求此旅宿的子集合。

使用國家/地區參數的請求範例:

https://api.ean.com/v3/properties/content?language=en-US&supply_source=expedia&country_code=US

此範例仍將提供與上面相同的分頁顯示功能,但需要進行翻閱的旅宿數量會變少。

減少請求的旅宿數量的另一種方法,是僅取得自上次提取旅宿資料後有變更過的旅宿。使用 date_updated_start 物件會僅回傳自特定日期後有變更過的旅宿。

使用國家/地區與日期參數的請求範例:

https://api.ean.com/v3/properties/content?language=en-US&supply_source=expedia&country_code=US&date_updated_start=2023-01-02

要改善分頁顯示速度和減少傳輸的資料量,關鍵在於僅請求所需的旅宿。

切分搜尋以便平行計算

有時候,即使只請求所需的旅宿,回傳的結果筆數仍然相當龐大。在此情況下,同時執行多個搜尋作業有助於加快流程。

第一步是將所需的搜尋分解為範圍更小的搜尋。對於每個使用案例來說此範例做法各有不同,但可以從所需的搜尋開始,然後在該項搜尋上加入更多彼此不重複的查詢參數。

例如,如果所需搜尋是針對美國的所有旅宿,則可以先依國家/地區開始進行篩選,如以上範例所示。

https://api.ean.com/v3/properties/content?language=en-US&supply_source=expedia&country_code=US

接著使用像是 property_rating_minproperty_rating_max 物件,進一步切分此搜尋。

https://api.ean.com/v3/properties/content?language=en-US&supply_source=expedia&country_code=US&property_rating_min=0.0&property_rating_max=0.9
https://api.ean.com/v3/properties/content?language=en-US&supply_source=expedia&country_code=US&property_rating_min=1.0&property_rating_max=1.9
https://api.ean.com/v3/properties/content?language=en-US&supply_source=expedia&country_code=US&property_rating_min=2.0&property_rating_max=2.9
https://api.ean.com/v3/properties/content?language=en-US&supply_source=expedia&country_code=US&property_rating_min=3.0&property_rating_max=3.9
https://api.ean.com/v3/properties/content?language=en-US&supply_source=expedia&country_code=US&property_rating_min=4.0&property_rating_max=4.9
https://api.ean.com/v3/properties/content?language=en-US&supply_source=expedia&country_code=US&property_rating_min=5.0

假設現在有六個單獨的請求,全部都可以單獨 (或同時) 進行分頁顯示。結果會與正在擷取的資料集合相同,但回傳速度更快。

每種情況有所不同,但從所需的搜尋開始並查看回應第一頁上的 pagination-total-results 回應標頭,便可提供指標讓我們了解切分搜尋是否有助於提升分頁顯示的效率。

程式碼範例

上述資訊僅概述分頁顯示流程的概念,以及切分資料的方式,而以下顯示的 Java 程式碼則提供了較為具體的範例。

**注意:**以下程式碼範例不包含正確的異常處理和其他最佳做法。一如既往,在編寫可用於正式運作的程式碼時,仍應遵循所有最佳做法。

首先,可以使用簡單的 RapidClient 類別做為進行 Rapid 呼叫的基礎。

public class RapidClient {
    // Base URL
    private static final String RAPID_BASE_URL = "https://api.ean.com";

    // Headers
    private static final String GZIP = "gzip";
    private static final String AUTHORIZATION_HEADER = "EAN APIKey={0},Signature={1},timestamp={2}";

    // HTTP Client
    private static final Client CLIENT = ClientBuilder.newClient().register(GZipEncoder.class);

    private final String apiKey;
    private final String sharedSecret;

    public RapidClient(String apikey, String sharedSecret) {
        this.apiKey = apikey;
        this.sharedSecret = sharedSecret;
    }

    public Response get(String path, MultivaluedMap<String, String> queryParameters) {
        WebTarget webTarget = CLIENT.target(RAPID_BASE_URL).path(path);

        // Add all query parameters from the map to the web target
        for (Map.Entry<String, List<String>> entry : queryParameters.entrySet()) {
            for (String value : entry.getValue()) {
                webTarget = webTarget.queryParam(entry.getKey(), value);
            }
        }

        return webTarget.request(MediaType.APPLICATION_JSON_TYPE)
                .header(HttpHeaders.ACCEPT_ENCODING, GZIP)
                .header(HttpHeaders.AUTHORIZATION, generateAuthHeader())
                .get();
    }

    private String generateAuthHeader() {
        final String timeStampInSeconds = String.valueOf(ZonedDateTime.now(ZoneOffset.UTC).toEpochSecond());
        final String input = apiKey + sharedSecret + timeStampInSeconds;
        final String signature = DigestUtils.sha512Hex(input);

        return MessageFormat.format(AUTHORIZATION_HEADER, apiKey, signature, timeStampInSeconds);
    }
}

這些只是單純的樣板程式碼,可以讓您更輕鬆地讀取後續的類別。

下一個類別將表示特定的內容 API 呼叫,並將使用 RapidClient 進行該呼叫。

public class PropertyContentCall {
    // Path
    private static final String PROPERTY_CONTENT_PATH = "v3/properties/content";

    // Headers
    private static final String LINK = "Link";
    private static final String PAGINATION_TOTAL_RESULTS = "Pagination-Total-Results";

    // Query parameters keys
    private static final String LANGUAGE = "language";
    private static final String SUPPLY_SOURCE = "supply_source";
    private static final String COUNTRY_CODE = "country_code";
    private static final String CATEGORY_ID_EXCLUDE = "category_id_exclude";
    private static final String TOKEN = "token";
    private static final String INCLUDE = "include";

    // Call parameters
    private final RapidClient client;
    private final String language;
    private final String supplySource;
    private final List<String> countryCodes;
    private final List<String> categoryIdExcludes;

    private String token;

    public PropertyContentCall(RapidClient client, String language, String supplySource,
            List<String> countryCodes, List<String> categoryIdExcludes) {
        this.client = client;
        this.language = language;
        this.supplySource = supplySource;
        this.countryCodes = countryCodes;
        this.categoryIdExcludes = categoryIdExcludes;
    }

    public Stream<RapidPropertyContent> stream() {
        return Stream.generate(() -> {
                    synchronized (this) {
                        // Make the call to Rapid.
                        final Response response = client.get(PROPERTY_CONTENT_PATH, queryParameters());

                        // Read the response to return.
                        final Map<String, RapidPropertyContent> propertyContents = response.readEntity(new GenericType<>() { });

                        // Store the token for pagination if we got one.
                        token = getTokenFromLink(response.getHeaderString(LINK));

                        return propertyContents;
                    }
                })
                .takeWhile(MapUtils::isNotEmpty)
                .map(Map::values)
                .flatMap(Collection::stream);
    }

    public Integer size() {
        // Make the call to Rapid.
        final MultivaluedMap<String, String> queryParameters = queryParameters();
        queryParameters.putSingle(INCLUDE, "property_ids");
        final Response response = client.get(PROPERTY_CONTENT_PATH, queryParameters);

        // Read the size to return.
        final Integer size = Integer.parseInt(response.getHeaderString(PAGINATION_TOTAL_RESULTS));

        // Close the response since we're not reading it.
        response.close();

        return size;
    }

    private MultivaluedMap<String, String> queryParameters() {
        final MultivaluedMap<String, String> queryParams = new MultivaluedHashMap<>();

        if (token != null) {
            queryParams.putSingle(TOKEN, token);
        } else {
            // Add required parameters
            queryParams.putSingle(LANGUAGE, language);
            queryParams.putSingle(SUPPLY_SOURCE, supplySource);

            // Add optional parameters
            if (CollectionUtils.isNotEmpty(countryCodes)) {
                queryParams.put(COUNTRY_CODE, countryCodes);
            }
            if (CollectionUtils.isNotEmpty(categoryIdExcludes)) {
                queryParams.put(CATEGORY_ID_EXCLUDE, categoryIdExcludes);
            }
        }

        return queryParams;
    }

    private String getTokenFromLink(String linkHeader) {
        if (StringUtils.isEmpty(linkHeader)) {
            return null;
        }

        final int startOfToken = linkHeader.indexOf("=") + 1;
        final int endOfToken = linkHeader.indexOf(">");

        return linkHeader.substring(startOfToken, endOfToken);
    }
}

PropertyContentCall 表示對 Rapid 內容 API 的單一請求,並透過完成呼叫來封裝分頁顯示的流程。

範例:

將下面的 API 呼叫與相同的 Java 請求進行比較。

https://api.ean.com/v3/properties/content?language=en-US&supply_source=expedia&country_code=US
PropertyContentCall request = new PropertyContentCall(myRapidClient, "en-US", "expedia", List.of("US"), null);
  • 此處使用的 PropertyContentCall 與此範例具體相關。系統將依 country_codecategory_id_exclude 細分呼叫 (雖然可能依使用案例而變更此項做法)。由於這是專為平行計算所編寫,因此該範例將使用 Java 平行資料流。公共 stream() 方法的存在,是為了回傳 RapidPropertyContent 物件資料流。 RapidPropertyContent 物件只是單純的 POJO,表示來自 Rapid 內容 API 呼叫的單一旅宿。雖然這裡使用 Java 平行資料流,但任何平行執行程式碼的方式皆可適用。
  • 當呼叫 stream() 的程式碼需要從資料流中讀取下一個旅宿時,如果已擷取到該旅宿,此方法便會提供,否則將呼叫 Rapid 內容 API 得出下一頁結果並從該頁結果中回傳一個旅宿。單純呼叫 stream() 並將其讀取完畢,即可處理透過請求回傳的每個旅宿的分頁顯示作業。
  • 而另一個公共助手方法 size() 則可讓您輕鬆地查看此 PropertyContentCall 會回傳的旅宿總數。此方法有助於判斷呼叫規模是否已經夠小,或是需要進一步切分為規模較小的呼叫以便進行平行計算。

在您呼叫 Rapid 並透過回應進行分頁顯示時,上述建構區塊已為您奠定了基礎。以下程式碼利用上述類別將呼叫自動切分為容易管理的片段、並透過平行進行規模較小的呼叫來進行分頁,然後將合併輸出內容寫入一個檔案。

public class ParallelFileMaker {
    private static final String APIKEY = System.getenv().get("RAPID_APIKEY");
    private static final String SHARED_SECRET = System.getenv().get("RAPID_SHARED_SECRET");
    private static final List<String> COUNTRIES = Arrays.asList("AD", "AE", "AF", "AG", "AI", "AL", "AM", "AO", "AQ",
            "AR", "AS", "AT", "AU", "AW", "AX", "AZ", "BA", "BB", "BD", "BE", "BF", "BG", "BH", "BI", "BJ", "BL", "BM",
            "BN", "BO", "BQ", "BR", "BS", "BT", "BV", "BW", "BY", "BZ", "CA", "CC", "CD", "CF", "CG", "CH", "CI", "CK",
            "CL", "CM", "CN", "CO", "CR", "CU", "CV", "CW", "CX", "CY", "CZ", "DE", "DJ", "DK", "DM", "DO", "DZ", "EC",
            "EE", "EG", "EH", "ER", "ES", "ET", "FI", "FJ", "FK", "FM", "FO", "FR", "GA", "GB", "GD", "GE", "GF", "GG",
            "GH", "GI", "GL", "GM", "GN", "GP", "GQ", "GR", "GS", "GT", "GU", "GW", "GY", "HK", "HM", "HN", "HR", "HT",
            "HU", "ID", "IE", "IL", "IM", "IN", "IO", "IQ", "IR", "IS", "IT", "JE", "JM", "JO", "JP", "KE", "KG", "KH",
            "KI", "KM", "KN", "KP", "KR", "KW", "KY", "KZ", "LA", "LB", "LC", "LI", "LK", "LR", "LS", "LT", "LU", "LV",
            "LY", "MA", "MC", "MD", "ME", "MF", "MG", "MH", "MK", "ML", "MM", "MN", "MO", "MP", "MQ", "MR", "MS", "MT",
            "MU", "MV", "MW", "MX", "MY", "MZ", "NA", "NC", "NE", "NF", "NG", "NI", "NL", "NO", "NP", "NR", "NU", "NZ",
            "OM", "PA", "PE", "PF", "PG", "PH", "PK", "PL", "PM", "PN", "PR", "PS", "PT", "PW", "PY", "QA", "RE", "RO",
            "RS", "RU", "RW", "SA", "SB", "SC", "SD", "SE", "SG", "SH", "SI", "SJ", "SK", "SL", "SM", "SN", "SO", "SR",
            "SS", "ST", "SV", "SX", "SY", "SZ", "TC", "TD", "TF", "TG", "TH", "TJ", "TK", "TL", "TM", "TN", "TO", "TR",
            "TT", "TV", "TW", "TZ", "UA", "UG", "UM", "US", "UY", "UZ", "VA", "VC", "VE", "VG", "VI", "VN", "VU", "WF",
            "WS", "YE", "YT", "ZA", "ZM", "ZW");
    private static final List<String> PROPERTY_CATEGORIES = Arrays.asList("0", "1", "2", "3", "4", "5", "6", "7", "8",
            "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26",
            "27", "28", "29", "30", "31", "32", "33", "34", "35", "36", "37", "38", "39", "40", "41", "42", "43", "44");
    private static final int MAX_CALL_SIZE = 20_000;
    private static final String LANGUAGE = "en-US";
    private static final String SUPPLY_SOURCE = "expedia";
    private static final ObjectMapper OBJECT_MAPPER = new ObjectMapper()
            .configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false)
            .registerModule(new JavaTimeModule());
    private static final RapidClient RAPID_CLIENT = new RapidClient(APIKEY, SHARED_SECRET);

    public void run() throws IOException {
        final Map<PropertyContentCall, Integer> allCalls = divideUpCalls();

        // Make sure we're making the calls in the most efficient order. This list will be smallest to largest, so
        // that when the streams get combined and are reversed, the largest stream will be first.
        final List<Stream<RapidPropertyContent>> callsToMake = allCalls.entrySet().stream()
                .filter(entry -> entry.getValue() > 0) // filter out any calls that don't have results
                .sorted(Map.Entry.comparingByValue()) // sort all the calls with the smallest calls first
                .map(Map.Entry::getKey) // just need the call itself now
                .map(PropertyContentCall::stream) // get the stream for each call
                .toList();

        // Combine all the streams into one big stream and actually make the calls and write to the file.
        try (Stream<RapidPropertyContent> bigStream = combineStreams(callsToMake);
             BufferedWriter outputFileWriter = createFileWriter(Path.of("output.jsonl.gz"))) {
            bigStream.parallel()
                    .forEach(property -> {
                        try {
                            // Write to output file
                            synchronized (outputFileWriter) {
                                outputFileWriter.append(OBJECT_MAPPER.writeValueAsString(property));
                                outputFileWriter.newLine();
                            }
                        } catch (Exception e) {
                            // Handle exception
                        }
                    });
        }
    }

    /**
     * This will split up the calls to be made based on the size of each call's results. It will first split into
     * calls per country and, if needed, it will then further split into calls per category for any country that is
     * too big on its own.
     * <p>
     * Currently, since there is no way to request a specific category, this will instead exclude all other
     * categories except the one it wants for that particular call. This can be simplified if more search
     * capabilities are added in the future.
     * <p>
     * The size of each call is also kept so that the calls can be further sorted if needed.
     *
     * @return A map containing all the calls and their respective sizes.
     */
    private Map<PropertyContentCall, Integer> divideUpCalls() {
        final Map<PropertyContentCall, Integer> allCalls = new HashMap<>();
        COUNTRIES.stream().parallel()
                .forEach(countryCode -> {
                    // Check to see if the entire country is small enough to get at once.
                    final PropertyContentCall countryCall = new PropertyContentCall(RAPID_CLIENT, LANGUAGE,
                            SUPPLY_SOURCE, List.of(countryCode), null);
                    final Integer countryCallSize = countryCall.size();

                    if (countryCallSize < MAX_CALL_SIZE) {
                        // It's small enough! No need to break this call up further.
                        allCalls.put(countryCall, countryCallSize);
                    } else {
                        // The country is too big, need to break up the call into smaller parts.
                        PROPERTY_CATEGORIES.stream().parallel()
                                .forEach(category -> {
                                    // Exclude every category except the current one, so it's as if we're searching
                                    // for only the current category.
                                    final List<String> excludedCategories = new ArrayList<>(PROPERTY_CATEGORIES);
                                    excludedCategories.remove(category);

                                    final PropertyContentCall categoryCall = new PropertyContentCall(RAPID_CLIENT,
                                            LANGUAGE, SUPPLY_SOURCE, List.of(countryCode), excludedCategories);

                                    allCalls.put(categoryCall, categoryCall.size());
                                });
                    }
                });

        return allCalls;
    }

    /**
     * This will combine multiple Streams into a single Stream. Because of how this is reduced, the Streams will end
     * up in the reverse order of the list that was passed in.
     * <p>
     * Note: Because this is concatenating multiple Streams together, each Stream will go on the stack. Thus, if
     * there are many Streams then a StackOverflowException can occur when trying to use the combined Stream. Make
     * sure the stack size is appropriate for your usage via the `-Xss` JVM parameter.
     *
     * @param streams A list of the Streams to combine.
     * @return The combined Stream that can be treated as one.
     */
    private <T> Stream<T> combineStreams(List<Stream<T>> streams) {
        return streams.stream()
                .filter(Objects::nonNull)
                .reduce(Stream::concat)
                .orElse(Stream.empty());
    }

    private BufferedWriter createFileWriter(Path path) throws IOException {
        return new BufferedWriter(
                new OutputStreamWriter(
                        new GZIPOutputStream(
                                Files.newOutputStream(path)),
                        StandardCharsets.UTF_8));
    }
}

雖然上面的程式碼內含許多解釋各個片段的內嵌評論,但您可以用以下方式加以總結:

  1. 依據使用案例將主呼叫分為規模較小的呼叫。(在此範例中,主呼叫係用來擷取所有結果,而細分作業則是依據 country_code 進行,並在需要時依據 category_id_exclude 進行)。
  2. 與此範例具體相關的方式是合併平行資料流,排序呼叫以提高運作效率。
  3. 然後平行執行這些呼叫,並將這些呼叫所回傳的旅宿寫入一個檔案。
這個頁面有幫助嗎?
我們能如何改善內容?
感謝您協助我們進行改善!