Content pagination

The pagination feature allows for large amounts of data to be divided into manageable pieces.

Overview

The Rapid Content API allows for the consumption of large amounts of property data. Because this data is large, the Content API supports pagination to divide the data into manageable pieces. This document provides a few examples and best practices for using the pagination feature.

Basic example

The pagination process starts by searching for properties and getting more results than will fit on a single page. When this happens, the response will contain the first page of results and then a Link response header that can be followed to get to the next page.

Example request:

https://api.ean.com/v3/properties/content?language=en-US&supply_source=expedia

Example response header:

Link: <https://api.ean.com/v3/properties/content?token=WVZTCUNRUQ4SUSNHAXIWFk0VRQ5JZhFWExRaXAgRVnpDA1RWTUkBB10FHFYGQwZyHwBNFA1HBhIMC1IGAUsGBhkHBGcHBBUGdlMHQAd1UA8WBwwMB1NcBAhdahBWUAdRXjtfVwpEBiEdASBHREpdEVwUQRxuRVgRWg1UaVkHS1kcA3IWBXEVAFZMAz1VBVRWXT5KRQNKFQVEACMXASJFVlRBVzoTRQZVQQRVOUdHVUAVDRRXIBNXJxdYAwtWQFJeVgpHAiYTCwoEWhRmZ0MHCxwFJhNbUEcGU1tCHW1dAWwAGlEIEAFVXEYNIRQBIRcTSltIAUVHTTxdAghAU3VDDSFCVkYCXFE8XgIMQwF7QAAlFwVZEVJpTUdWBBcHU2cBXgEKRgFwFVdxR1tWQQtHQhhuUFgAAA5WE1oKH0JcBEZVDGdVBBdVQl4BVQgFVRIVEFwWBBdHS2xKBU1RDANvDFFfX0cNekZTcxJeE1gQW24XDw8RDEdTIUBTJhFTAxZXb1lUAVNRa1ZZAFxHAXQVUHxDVxdDUAxcFRVmVFpQBlRbFFNxEAwgRXcMXAdfFUZbBFQAXFQGV1YCAVI=>; rel="next"; expires=2023-06-01T17:13:19.699379618Z

Follow the link provided before the expiration time and the next page of results will be returned along with a new Link header for the page after that. To page through the entire response just continue to follow each Link header that is returned until no further Link header is returned. This signals the end of the data set requested.

Filtering the requested data

While the simple example above shows how pagination works, it is also a very large search. When there are many properties, it can take some time to paginate through all of them. It is helpful to only search for the properties that are actually needed by including extra query parameters.

For instance, rather than requesting all properties, maybe only properties in the United States are needed. This subset of properties can be requested by changing the request to include a query parameter using the country_code object.

Example request with country parameter:

https://api.ean.com/v3/properties/content?language=en-US&supply_source=expedia&country_code=US

This will still provide the same pagination capabilities as above, but there will be fewer properties to page through.

Another way of reducing the number of properties requested is to only get properties that have changed since the last time property data was pulled. Using the date_updated_start object returns only those properties that have changed since the date given.

Example request with country and date parameters:

https://api.ean.com/v3/properties/content?language=en-US&supply_source=expedia&country_code=US&date_updated_start=2023-01-02

Making sure to only request the properties that are needed is key to improving pagination speed and lowering the amount of data transferred.

Splitting searches for parallelization

Sometimes, even when requesting just the properties that are needed, the result is still large. In this case it is useful to do multiple searches in parallel to help speed up the process.

The first step is to break apart the desired search into smaller searches. This will be different for every use case but can be achieved by starting with the desired search and then adding more query parameters onto that search that don't overlap with each other.

For example, if the desired search is for all the properties in the United States, then start by filtering by country as in the example above.

https://api.ean.com/v3/properties/content?language=en-US&supply_source=expedia&country_code=US

This search can then be further split using, for example, the property_rating_min and property_rating_max objects.

https://api.ean.com/v3/properties/content?language=en-US&supply_source=expedia&country_code=US&property_rating_min=0.0&property_rating_max=0.9
https://api.ean.com/v3/properties/content?language=en-US&supply_source=expedia&country_code=US&property_rating_min=1.0&property_rating_max=1.9
https://api.ean.com/v3/properties/content?language=en-US&supply_source=expedia&country_code=US&property_rating_min=2.0&property_rating_max=2.9
https://api.ean.com/v3/properties/content?language=en-US&supply_source=expedia&country_code=US&property_rating_min=3.0&property_rating_max=3.9
https://api.ean.com/v3/properties/content?language=en-US&supply_source=expedia&country_code=US&property_rating_min=4.0&property_rating_max=4.9
https://api.ean.com/v3/properties/content?language=en-US&supply_source=expedia&country_code=US&property_rating_min=5.0

Now there are six separate requests that can all be paginated through independently and in parallel. The result is the same set of data being retrieved, but faster.

Every situation will be different, but starting with the desired search and looking at the pagination-total-results response header on the first page of responses will provide an indicator of whether it would be helpful to split the search or not.

Code example

While the above information conceptually outlines the process of pagination and how to split data, below is some Java code that can provide a more concrete example.

Note: Proper exception handling and other best practices are not included in the following code example. As always, all best practices should still be followed when writing production-ready code.

To begin, a simple RapidClient class can be used as the base to make Rapid calls.

public class RapidClient {
    // Base URL
    private static final String RAPID_BASE_URL = "https://api.ean.com";

    // Headers
    private static final String GZIP = "gzip";
    private static final String AUTHORIZATION_HEADER = "EAN APIKey={0},Signature={1},timestamp={2}";

    // HTTP Client
    private static final Client CLIENT = ClientBuilder.newClient().register(GZipEncoder.class);

    private final String apiKey;
    private final String sharedSecret;

    public RapidClient(String apikey, String sharedSecret) {
        this.apiKey = apikey;
        this.sharedSecret = sharedSecret;
    }

    public Response get(String path, MultivaluedMap<String, String> queryParameters) {
        WebTarget webTarget = CLIENT.target(RAPID_BASE_URL).path(path);

        // Add all query parameters from the map to the web target
        for (Map.Entry<String, List<String>> entry : queryParameters.entrySet()) {
            for (String value : entry.getValue()) {
                webTarget = webTarget.queryParam(entry.getKey(), value);
            }
        }

        return webTarget.request(MediaType.APPLICATION_JSON_TYPE)
                .header(HttpHeaders.ACCEPT_ENCODING, GZIP)
                .header(HttpHeaders.AUTHORIZATION, generateAuthHeader())
                .get();
    }

    private String generateAuthHeader() {
        final String timeStampInSeconds = String.valueOf(ZonedDateTime.now(ZoneOffset.UTC).toEpochSecond());
        final String input = apiKey + sharedSecret + timeStampInSeconds;
        final String signature = DigestUtils.sha512Hex(input);

        return MessageFormat.format(AUTHORIZATION_HEADER, apiKey, signature, timeStampInSeconds);
    }
}

This is simply some boilerplate code that will make reading the next classes easier.

The next class will represent a specific Content API call and will use RapidClient to make that call.

public class PropertyContentCall {
    // Path
    private static final String PROPERTY_CONTENT_PATH = "v3/properties/content";

    // Headers
    private static final String LINK = "Link";
    private static final String PAGINATION_TOTAL_RESULTS = "Pagination-Total-Results";

    // Query parameters keys
    private static final String LANGUAGE = "language";
    private static final String SUPPLY_SOURCE = "supply_source";
    private static final String COUNTRY_CODE = "country_code";
    private static final String CATEGORY_ID = "category_id";
    private static final String TOKEN = "token";
    private static final String INCLUDE = "include";

    // Call parameters
    private final RapidClient client;
    private final String language;
    private final String supplySource;
    private final List<String> countryCodes;
    private final List<String> categoryIds;

    private String token;

    public PropertyContentCall(RapidClient client, String language, String supplySource,
                               List<String> countryCodes, List<String> categoryIds) {
        this.client = client;
        this.language = language;
        this.supplySource = supplySource;
        this.countryCodes = countryCodes;
        this.categoryIds = categoryIds;
    }

    public Stream<RapidPropertyContent> stream() {
        return Stream.generate(() -> {
                    synchronized (this) {
                        // Make the call to Rapid.
                        final Response response = client.get(PROPERTY_CONTENT_PATH, queryParameters());

                        // Read the response to return.
                        final Map<String, RapidPropertyContent> propertyContents = response.readEntity(new GenericType<>() { });

                        // Store the token for pagination if we got one.
                        token = getTokenFromLink(response.getHeaderString(LINK));

                        return propertyContents;
                    }
                })
                .takeWhile(MapUtils::isNotEmpty)
                .map(Map::values)
                .flatMap(Collection::stream);
    }

    public Integer size() {
        // Make the call to Rapid.
        final MultivaluedMap<String, String> queryParameters = queryParameters();
        queryParameters.putSingle(INCLUDE, "property_ids");
        final Response response = client.get(PROPERTY_CONTENT_PATH, queryParameters);

        // Read the size to return.
        final Integer size = Integer.parseInt(response.getHeaderString(PAGINATION_TOTAL_RESULTS));

        // Close the response since we're not reading it.
        response.close();

        return size;
    }

    private MultivaluedMap<String, String> queryParameters() {
        final MultivaluedMap<String, String> queryParams = new MultivaluedHashMap<>();

        if (token != null) {
            queryParams.putSingle(TOKEN, token);
        } else {
            // Add required parameters
            queryParams.putSingle(LANGUAGE, language);
            queryParams.putSingle(SUPPLY_SOURCE, supplySource);

            // Add optional parameters
            if (CollectionUtils.isNotEmpty(countryCodes)) {
                queryParams.put(COUNTRY_CODE, countryCodes);
            }
            if (CollectionUtils.isNotEmpty(categoryIds)) {
                queryParams.put(CATEGORY_ID, categoryIds);
            }
        }

        return queryParams;
    }

    private String getTokenFromLink(String linkHeader) {
        if (StringUtils.isEmpty(linkHeader)) {
            return null;
        }

        final int startOfToken = linkHeader.indexOf("=") + 1;
        final int endOfToken = linkHeader.indexOf(">");

        return linkHeader.substring(startOfToken, endOfToken);
    }
}

A PropertyContentCall represents a single request to the Rapid Content API and encapsulates the process of paging through that call to completion.

Example:

Compare the API call below with the equivalent Java request.

https://api.ean.com/v3/properties/content?language=en-US&supply_source=expedia&country_code=US
PropertyContentCall request = new PropertyContentCall(myRapidClient, "en-US", "expedia", List.of("US"), null);
  • The PropertyContentCall used here is very specific to this example. The calls will be broken down by country_code and category_id, though this can be changed based on the use case. Since this is being written specifically for parallelization, this example will be making use of Java Parallel Streams. The public stream() method exists to give back a stream of RapidPropertyContent objects. A RapidPropertyContent object is simply a POJO representing a single property from the Rapid Content API call. While Java Parallel Streams are being used here, any way of running code in parallel will suffice.
  • When the code that calls stream() needs to read another property off of the stream, this method will either provide that property if it has already retrieved it, or will call the Rapid Content API for the next page of results and return a property from that. Simply calling stream() and reading it to completion will handle paginating through every property returned via the request.
  • There is another public helper method size() which provides a way to easily see the total number of properties that will be returned by this PropertyContentCall. This can help in determining whether a call is small enough already or whether it needs to be further split into smaller calls for parallelization.

The above building blocks provide a foundation for calling Rapid and paginating through the response. The below code utilizes the above classes to automatically split a call into manageable pieces, page through all the smaller calls in parallel, and write the combined output to a file.

public class ParallelFileMaker {
    private static final String APIKEY = System.getenv().get("RAPID_APIKEY");
    private static final String SHARED_SECRET = System.getenv().get("RAPID_SHARED_SECRET");
    private static final List<String> COUNTRIES = Arrays.asList("AD", "AE", "AF", "AG", "AI", "AL", "AM", "AO", "AQ",
            "AR", "AS", "AT", "AU", "AW", "AX", "AZ", "BA", "BB", "BD", "BE", "BF", "BG", "BH", "BI", "BJ", "BL", "BM",
            "BN", "BO", "BQ", "BR", "BS", "BT", "BV", "BW", "BY", "BZ", "CA", "CC", "CD", "CF", "CG", "CH", "CI", "CK",
            "CL", "CM", "CN", "CO", "CR", "CU", "CV", "CW", "CX", "CY", "CZ", "DE", "DJ", "DK", "DM", "DO", "DZ", "EC",
            "EE", "EG", "EH", "ER", "ES", "ET", "FI", "FJ", "FK", "FM", "FO", "FR", "GA", "GB", "GD", "GE", "GF", "GG",
            "GH", "GI", "GL", "GM", "GN", "GP", "GQ", "GR", "GS", "GT", "GU", "GW", "GY", "HK", "HM", "HN", "HR", "HT",
            "HU", "ID", "IE", "IL", "IM", "IN", "IO", "IQ", "IR", "IS", "IT", "JE", "JM", "JO", "JP", "KE", "KG", "KH",
            "KI", "KM", "KN", "KP", "KR", "KW", "KY", "KZ", "LA", "LB", "LC", "LI", "LK", "LR", "LS", "LT", "LU", "LV",
            "LY", "MA", "MC", "MD", "ME", "MF", "MG", "MH", "MK", "ML", "MM", "MN", "MO", "MP", "MQ", "MR", "MS", "MT",
            "MU", "MV", "MW", "MX", "MY", "MZ", "NA", "NC", "NE", "NF", "NG", "NI", "NL", "NO", "NP", "NR", "NU", "NZ",
            "OM", "PA", "PE", "PF", "PG", "PH", "PK", "PL", "PM", "PN", "PR", "PS", "PT", "PW", "PY", "QA", "RE", "RO",
            "RS", "RU", "RW", "SA", "SB", "SC", "SD", "SE", "SG", "SH", "SI", "SJ", "SK", "SL", "SM", "SN", "SO", "SR",
            "SS", "ST", "SV", "SX", "SY", "SZ", "TC", "TD", "TF", "TG", "TH", "TJ", "TK", "TL", "TM", "TN", "TO", "TR",
            "TT", "TV", "TW", "TZ", "UA", "UG", "UM", "US", "UY", "UZ", "VA", "VC", "VE", "VG", "VI", "VN", "VU", "WF",
            "WS", "YE", "YT", "ZA", "ZM", "ZW");
    private static final List<String> PROPERTY_CATEGORIES = Arrays.asList("0", "1", "2", "3", "4", "5", "6", "7", "8",
            "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "20", "21", "22", "23", "24", "25", "26",
            "29", "30", "31", "32", "33", "34", "36", "37", "39", "40", "41", "42", "43", "44");
    private static final int MAX_CALL_SIZE = 20_000;
    private static final String LANGUAGE = "en-US";
    private static final String SUPPLY_SOURCE = "expedia";
    private static final ObjectMapper OBJECT_MAPPER = new ObjectMapper()
            .configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false)
            .registerModule(new JavaTimeModule());
    private static final RapidClient RAPID_CLIENT = new RapidClient(APIKEY, SHARED_SECRET);

    public void run() throws IOException {
        final Map<PropertyContentCall, Integer> allCalls = divideUpCalls();

        // Make sure we're making the calls in the most efficient order. This list will be smallest to largest, so
        // that when the streams get combined and are reversed, the largest stream will be first.
        final List<Stream<RapidPropertyContent>> callsToMake = allCalls.entrySet().stream()
                .filter(entry -> entry.getValue() > 0) // filter out any calls that don't have results
                .sorted(Map.Entry.comparingByValue()) // sort all the calls with the smallest calls first
                .map(Map.Entry::getKey) // just need the call itself now
                .map(PropertyContentCall::stream) // get the stream for each call
                .toList();

        // Combine all the streams into one big stream and actually make the calls and write to the file.
        try (Stream<RapidPropertyContent> bigStream = combineStreams(callsToMake);
             BufferedWriter outputFileWriter = createFileWriter(Path.of("output.jsonl.gz"))) {
            bigStream.parallel()
                    .forEach(property -> {
                        try {
                            // Write to output file
                            synchronized (outputFileWriter) {
                                outputFileWriter.append(OBJECT_MAPPER.writeValueAsString(property));
                                outputFileWriter.newLine();
                            }
                        } catch (Exception e) {
                            // Handle exception
                        }
                    });
        }
    }

    /**
     * This will split up the calls to be made based on the size of each call's results. It will first split into
     * calls per country and, if needed, it will then further split into calls per category for any country that is
     * too big on its own.
     * The size of each call is also kept so that the calls can be further sorted if needed.
     *
     * @return A map containing all the calls and their respective sizes.
     */
    private Map<PropertyContentCall, Integer> divideUpCalls() {
        final Map<PropertyContentCall, Integer> allCalls = new HashMap<>();
        COUNTRIES.stream().parallel()
                .forEach(countryCode -> {
                    // Check to see if the entire country is small enough to get at once.
                    final PropertyContentCall countryCall = new PropertyContentCall(RAPID_CLIENT, LANGUAGE,
                            SUPPLY_SOURCE, List.of(countryCode), null);
                    final Integer countryCallSize = countryCall.size();

                    if (countryCallSize < MAX_CALL_SIZE) {
                        // It's small enough! No need to break this call up further.
                        allCalls.put(countryCall, countryCallSize);
                    } else {
                        // The country is too big, need to break up the call into smaller parts.
                        PROPERTY_CATEGORIES.stream().parallel()
                                .forEach(category -> {
                                    final PropertyContentCall categoryCall = new PropertyContentCall(RAPID_CLIENT,
                                            LANGUAGE, SUPPLY_SOURCE, List.of(countryCode), List.of(category));

                                    allCalls.put(categoryCall, categoryCall.size());
                                });
                    }
                });

        return allCalls;
    }

    /**
     * This will combine multiple Streams into a single Stream. Because of how this is reduced, the Streams will end
     * up in the reverse order of the list that was passed in.
     * <p>
     * Note: Because this is concatenating multiple Streams together, each Stream will go on the stack. Thus, if
     * there are many Streams then a StackOverflowException can occur when trying to use the combined Stream. Make
     * sure the stack size is appropriate for your usage via the `-Xss` JVM parameter.
     *
     * @param streams A list of the Streams to combine.
     * @return The combined Stream that can be treated as one.
     */
    private <T> Stream<T> combineStreams(List<Stream<T>> streams) {
        return streams.stream()
                .filter(Objects::nonNull)
                .reduce(Stream::concat)
                .orElse(Stream.empty());
    }

    private BufferedWriter createFileWriter(Path path) throws IOException {
        return new BufferedWriter(
                new OutputStreamWriter(
                        new GZIPOutputStream(
                                Files.newOutputStream(path)),
                        StandardCharsets.UTF_8));
    }
}

While the above code has many comments inline explaining the various pieces, it can be summed up in the following way:

  1. Divide up the main call into smaller calls based on the use case. (In this example, the main call is to get everything and the division is by country_code and, if needed, by category_id).
  2. Specific to the way this example is combining parallel streams, the calls are sorted to run more efficiently.
  3. The calls are then run in parallel and the properties returned from those calls are written to a file.
Did you find this page helpful?
How can we improve this content?
Thank you for helping us improve!